24 gigabytes is more than enough to run a local LLM for a small household or bus...

rhdunn · 2025-06-20T07:57:15 1750406235

It depends on 1) what model you are running; and 2) how many models you are running.

You can just about run a 32B (at Q4/Q5 quantization) on 24GB. Running anything higher (such as the increasingly common 70B models, or higher if you want to run something like Llama 4 or DeepSeek) means splitting the model between RAM and RAM. -- But yes, anything 24B or lower you can run comfortably, including enough capacity for the context.

If you have other models -- such as text-to-speech, speech recognition, etc. -- then those are going to take up VRAM for both the model and during processing/generation. That affects the size of LLM you can run.

fc417fc802 · 2025-06-20T09:31:36 1750411896

Only if you'll settle for less than state of the art. The best models still tend to be some of the largest ones.

Anything that overflows VRAM is going to slow down the response time drastically.

"Space heater" is determined by computational horsepower rather than available RAM.

How big a context window do you want? Last I checked that was very expensive in terms of RAM and having a large one was highly desirable.

otabdeveloper4 · 2025-06-20T12:08:51 1750421331

State of the art is achieved by finetuning. Increasing parameter counts is a dead end.

Large contexts are very important but they are cheap compared in terms of RAM compared to the costs of increasing parameter count.