Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What did you set the context window to? That's been my main issue with models on my macbook, you have to set the context window so short that they are way less useful than the hosted models. Is there something I'm misisng there?


With LM Studio you can configure context window freely. Max is 131072 for gpt-oss-20b.


Yes but if I set it above ~16K on my 32gb laptop it just OOMs. Am I doing something wrong?


try enable flash attention and offload all layer to GPU


I punted it up to the maximum in LM Studio - seems to use about 16GB of RAM then, but I've not tried a long prompt yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: