Hacker News new | past | comments | ask | show | jobs | submit login

(A brief note: While not weak, the laptop version of a 3080 Ti is far surpassed by even just a desktop 4060 Ti, which is sold for less than 400$. So it's possible to setup a stronger system relatively cheaply. What's good enough depends on the expectations.)



Unless you have special needs like very high usage, privacy or other ones depicted in the article, buying another computer many hundred dollars for the unique purpose of running local models is a hard sell.

If you use their API instead of their sub-based offers, the most popular models are cheap to use and with BYOK tools, switching model is as easy as entering another string in a form.

For instance I put $15 on my OpenAI account in August 2023, since then I used Dall-E weekly and I still got more than $5 credit left!


it seemed to me that the bottleneck mostly revolved around the layers that were in system ram and that a lack of vram was really the gating factor in terms of reasonable inference performance. (although i would imagine that there's probably some more optimization that could be done to make best use of a split vram/sysram setup.)

in any event it was fun to try out, but still didn't seem anywhere near how well the hosted models work. a heavy duty workstation with a bunch of gpus/vram would probably be a different story though.


> it seemed to me that the bottleneck mostly revolved around the layers that were in system ram and that a lack of vram was really the gating factor in terms of reasonable inference performance. (although i would imagine that there's probably some more optimization that could be done to make best use of a split vram/sysram setup.)

You could try a model that fits entirely into in VRAM. It"s a trade of precision for a decent bit of performance. 16GB is plenty to work with as i've seen acceptable enough results with 7B models on my 8GB GPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: