Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have you used a 100GB model on a Mac studio? Tokens per second is single digit, I didn't find it usable at all, found myself going back to cloud APIs where 3000$ goes a much longer way

I'm looking forward to trying Nvidia's little set top box if it actually ships, should have higher memory bandwidth, but still Ill probably set up a system where I email a query with attachments and just let DeepSeek email me back once it's finished with reasoning at 10T/s




It might blow your mind that you can run a quantized DeepSeek-R1 (671B) at over 15 t/s on an M2 Ultra 192GB and still get around 9000 context.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: