Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s not that crazy, just the architecture of differently quantized models and so on that you’d need to do that is impressive considering.


The models are the same, it's the surrounding processing like "thinking" iterations that are adjusted.


That only works for LRMs no? Not traditional LLM inference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: