>The ideal case would be something that can be run locally, or at least on a modest/inexpensive cluster.
It's obviously valuable, so it should be coming. I expect 2 trends:
- Local GPU/NPU will have a for-LLM version that has 50-100GB VRAM and runs MXFP4 etc.
- Distillation will come for reasoning coding agents, probably one for each tech stack (LAMP, Android app, AWS, etc.)x business domain (gaming, social, finance, etc.)
It's obviously valuable, so it should be coming. I expect 2 trends:
- Local GPU/NPU will have a for-LLM version that has 50-100GB VRAM and runs MXFP4 etc.
- Distillation will come for reasoning coding agents, probably one for each tech stack (LAMP, Android app, AWS, etc.)x business domain (gaming, social, finance, etc.)