Have you used a 100GB model on a Mac studio? Tokens per second is single digit, ...

jazzyjackson 4 months ago | parent | context | favorite | on: Apple unveils new Mac Studio

Have you used a 100GB model on a Mac studio? Tokens per second is single digit, I didn't find it usable at all, found myself going back to cloud APIs where 3000$ goes a much longer way

I'm looking forward to trying Nvidia's little set top box if it actually ships, should have higher memory bandwidth, but still Ill probably set up a system where I email a query with attachments and just let DeepSeek email me back once it's finished with reasoning at 10T/s

clonky 4 months ago [–]

It might blow your mind that you can run a quantized DeepSeek-R1 (671B) at over 15 t/s on an M2 Ultra 192GB and still get around 9000 context.