(A brief note: While not weak, the laptop version of a 3080 Ti is far surpassed ...

wiether · 2024-09-22T06:02:40 1726984960

Unless you have special needs like very high usage, privacy or other ones depicted in the article, buying another computer many hundred dollars for the unique purpose of running local models is a hard sell.

If you use their API instead of their sub-based offers, the most popular models are cheap to use and with BYOK tools, switching model is as easy as entering another string in a form.

For instance I put $15 on my OpenAI account in August 2023, since then I used Dall-E weekly and I still got more than $5 credit left!

a-dub · 2024-09-22T04:48:53 1726980533

it seemed to me that the bottleneck mostly revolved around the layers that were in system ram and that a lack of vram was really the gating factor in terms of reasonable inference performance. (although i would imagine that there's probably some more optimization that could be done to make best use of a split vram/sysram setup.)

in any event it was fun to try out, but still didn't seem anywhere near how well the hosted models work. a heavy duty workstation with a bunch of gpus/vram would probably be a different story though.

froggit · 2024-09-24T04:02:13 1727150533

> it seemed to me that the bottleneck mostly revolved around the layers that were in system ram and that a lack of vram was really the gating factor in terms of reasonable inference performance. (although i would imagine that there's probably some more optimization that could be done to make best use of a split vram/sysram setup.)

You could try a model that fits entirely into in VRAM. It"s a trade of precision for a decent bit of performance. 16GB is plenty to work with as i've seen acceptable enough results with 7B models on my 8GB GPU.