Hacker News new | past | comments | ask | show | jobs | submit | smith7018's comments login

Clothing is handmade. It doesn't matter if it's luxury or from Shein; it's all handmade. Artisans can work tirelessly to make sure everything is stitched the exact same way but anything below that is made for the mass market. Those tend to be people paid nothing to work as fast as possible to make as many items as possible. In that environment, you're going to get a lot of inconsistency. The only tech that helps here is the sewing machine and using lasers to cut the pieces. Compare that to iPhones where there are a lot of industrial machines that are used to create each of the pieces paired with highly trained individuals helping assemble it. The iPhone is also a "luxury" good so they have a lot of QC whereas a shirt from Old Navy is cheap and as long as it "looks" correct then they'll sell it for $8.


Encore is a new-ish used clothing indexer that might be what you're looking for. I also use Gem which doesn't use AI but indexes multiple vintage/used sites and will notify you when something pops up with your saved searches.


Sounds like steps in the right direction, but not entirely what I'm looking for.

I want AI that can scrape shop websites for attributes that people commonly search for, such as size and color, etc., but also shipping methods, shipping costs, etc. I think this would be trivial for an LLM. For me the scope should be bigger than just used clothes. I prefer new clothes (but I wear them until the end). And the system should be web-wide, not just selected shops.

And then I want a basic filtering system that allows me to quickly find what I need by checking some boxes.

It sounds so simple ...


You can build an x86 machine that can fully run DeepSeek R1 with 512GB VRAM for ~$2,500?


You will have to explain to me how.



Is that a CPU based inference build? Shouldn't you be able to get more performance out of the M3's GPU?


Inference is about memory bandwidth and some CPUs have just as much bandwidth as a GPU.



How would you compare the tok/sec between this setup and the M3 Max?


3.5 - 4.5 tokens/s on the $2,000 AMD Epyc setup. Deepseek 671b q4.

The AMD Epyc build is severely bandwidth and compute constrained.

~40 tokens/s on M3 Ultra 512GB by my calculation.


IMO, it would be more interesting to have a 3-way comparison of price/performance between DeepSeek 671b running on :

1. M3 Ultra 512 2. AMD Epyc (which Gen ? AVX512 and DDR5 might make a difference in both performance and cost , Gen 4 or Gen 5 have 8 or 9 t/s https://github.com/ggml-org/llama.cpp/discussions/11733 ) 2. AMD Epyc + 4090 or 5090 running KTransformers (over 10 t/s decode ? https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...)


Thanks!

If the M3 can run 24/7 without overheating it's a great deal to run agents. Especially considering that it should run only using 350W... so roughly $50/mo in electricity costs.


Out of curiosity, if you dont mind: what kind of an agent would you run 24/7 locally?

I'd assume this thing peaks at 350W (or whatever) but idles at around 40w tops?


I’m guessing they might be thinking long training jobs as opposed to model use in an end product if done sort.


What kind of Nvidia-based rig would one need to achieve 40 tokens/sec on Deepseek 671b? And how much would it cost?


Around 5x Nvidia A100 80GB can fit 671b Q4. $50k just for the GPUs and likely much more when including cooling, power, motherboard, CPU, system RAM, etc.


So the M3 Ultra is amazing value then. And from what I could tell, an equivalent AMD Epyc would still be so constrained that we're talking 4-5 tokens/s. Is this a fair assumption?


No. The advantage of Epic is you get 12 channels of ram so it should be ~6x faster than a consumer cpu.


I realize that but apparently people are still getting very low tokens/sec on Epyc. Why is that? I don't get it, as on paper it should be fast.


The Epyc would only set you back $2000 though, so it’s only a slightly worse price/return.


How many tokens/s would that be though?


That's what I'm trying to get to. Looking to set up a rig, and AMD Epyc seems reasonable but I'd rather go Mac if it's giving many more tokens per second. It does sound like the Mac with M3 Ultra will easily give 40 tokens/s, where as the Epyc is just internally constrained too much, giving 4-5 tokens/s but I'd like someone to confirm that, instead of buying the HW and finding out myself. :)


Probably a lot more. Those are server-grade GPUs. We're talking prosumer grade Macs.

I don't know how to calculate tokens/s for H100s linked together. ChatGPT might help you though. :)


Well, ChatGPT quotes 25k-75k tokens/s with 5 H100 (so very very far from the 40 tokens/s), but I doubt this is accurate (e.g. it completly ignored the fact they are linked together and instead just multiplied the estimation of the tokens/s for one H100 by 5).

If this is remotely accurate though it's still at least an order of magnitude more convenient than the M3 Ultra, even after factoring in all the other costs associated with the infrastructure.


Hopefully 7 years from now you'll still be able to use it with modern apps, websites, and video content. IMO, The benefits of these chips are in longevity rather than pushing them to the limit today.


This is the pretty obvious answer. I'm looking at replacing my gen-3 iPad Air from 2019 because it's feeling pretty pokey now. (And my wife's gen-1 iPad Air from 2013 is entirely unusable.)


I don't think there's any amount of processing power that can keep up with website bloat long term, but you out to get an extra year or two from the M3


For what it's worth, looking at the benchmarks, I think the machine they built is comparable to what your MBP can already do. They probably have a better inference speed, though.


It affects the millions of people that buy the machine by way of longevity.


Usually when I see advances, it's less about future proofing and more about obsoletion of old hardware. A more exaggerated case of this was in the 90s, people would upgrade to a 200 MHz p1 thinking they were future proofing but in a couple years you had 500Mhz P2s.


They announced earlier in the week that there will only be three days of announcements


I thought “unified memory” was just a marketing term for the memory being extremely close to the processor?


No, unified memory usually means the CPU and GPU (and miscellaneous things like the NPU) all use the same physical pool of RAM and moving data between them is essentially zero-cost. That's in contrast to the usual PC setup where the CPU has its own pool of RAM, which is unified with the iGPU if it has one, but the discrete GPU has its own independent pool of VRAM and moving data between the two pools is a relatively slow operation.

An RTX4090 or H100 has memory extremely close to the processor but I don't think you would call it unified memory.


I don't quite understand one of the finer points of this, under caffeinated :) - if GPU memory is extremely close to the CPU memory, what sort of memory would not be extremely close to the CPU?


I think you misunderstood what I meant by "processor", the memory on a discrete GPU is very close to the GPUs processor die, but very far away from the CPU. The GPU may be able to read and write its own memory at 1TB/sec but the CPU trying to read or write that same memory will be limited by the PCIe bus, which is glacially slow by comparison, usually somewhere around 16-32GB/sec.

A huge part of optimizing code for discrete GPUs is making sure that data is streamed into GPU memory before the GPU actually needs it, because pushing or pulling data over PCIe on-demand decimates performance.


> CPU trying to read or write that same memory will be limited by the PCIe bus, which is glacially slow by comparison, usually somewhere around 16-32GB/sec.

If you’re forking out for H100’s you’ll usually be putting them on a bus with much higher throughput, 200GB/s or more.


I see, TL;DR == none; and processor switches from {CPU,GPU} to {GPU} in the 2nd paragraph. Thanks!


I thought it meant that both the GPU and the CPU can access it. In most systems, GPU memory cannot be accessed by the CPU (without going through the GPU); and vice versa.


CPUs access GPU memory via MMIO (though usually only a small portion), and GPUs can in principle access main memory via DMA. Meaning, both can share an address space and access each other’s memory. However, that wouldn’t be called Unified Memory, because it’s still mediated by an external bus (PCIe) and thus relatively slower.


Are they cache coherent these days? I feel like any unified memories should be.


Yes, but that’s an incomplete view on the obesity epidemic in the West, imo. It’s not just that there’s “more access to calories,” it’s that access to healthy foods is getting more difficult for a large portion of the population. People working multiple jobs don’t have time to cook a complete, nutritious meal. Also, due to our ever-increasing wealth inequality, it’s harder for people to afford healthy food. A whole chicken, a vegetable, and a starch will always cost more than getting something at Wendy’s. Similarly, a jar of jelly is cheaper and lasts longer than a box of strawberries.


I'm Brazilian, but whether you consider Latin America western or western-adjacent, here healthy food is definitely not cheaper than processed food at all. Yet, you can see populations and regions dropping from food insecurity directly into obesity as soon as people do have access to more food.

The time argument might be relevant, but even then, most Brazilians do have cheap and easy access to a very healthy lunch in restaurants or to-go meals, purchased or prepared, with rice, beans, meat, salad... The breakfast is probably bread, but I'd say most people don't eat a lot of that in the morning. Getting proper nutrition at night will probably be problematic, but it's also a smaller window...

But, like I said, processed food is quite expensive here. For instance, a 1 kg of chicken breast goes by less than a third the price of a McDonald's combo. A pack of cookies or snacks will be like double the price of a 1 kg of bananas...


I can only speak for my culture so thank you for the perspective and insights on yours. Just checked and it seems like bananas are 27.5% cheaper in Brazil than America. Chicken fillets are a shocking 71% cheaper! I'm sure I'm not taking a lot of things into account here like the average income levels, but still, that's crazy.


What you're describing is a state of flow which is good for things like work but the article seems to be talking about time metaphorically.

For example, imagine you're going to your daughter's piano recital and spend the whole time thinking about work. You would be missing out on the experience of watching her perform and grow. If you become mindful of these habits and say "My mind is focusing on something that I cannot change right now, I should be present" then you'll be able to fully experience a moment in your child's life. So rather than feeling like life is passing you by, you're able to experience it in the moment. The surrounding sentences of the line you quoted don't read like the author's describing time like you are:

"But in this process, we must remember something important: life is not meant to be rushed through. It is not a race, nor is it a problem to be solved. It is an experience to be lived, and living well requires presence. ... Moments become rich, textured. Even the simplest of tasks takes on a new significance when approached with care, with attention."


Maybe you're both right? Staying on the example of recitals. When I concentrate hard listening to the music they seem like they last forever while also being over in the blink of an eye!

Similar sensation to being in an isolation chamber


I think this is correct. Time is not, metaphorically, just the perception of elapsed seconds. There is a dimension of depth. And while it may “fly by” it can feel slow if it was spent with depth.

An hour on social media and time laughing with friends can both be fleeting but one will feel better spent.


In my experience it is not metaphorical, but an actual effect on how you experience time.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: