Hacker News new | past | comments | ask | show | jobs | submit login

This doesn’t sound correct.

You don’t know which expert you’ll need for each layer, so you either keep them all loaded in memory or stream them from disk




In RAM, yes. But if you compute an activation, you need to load the weights from RAM to the GPU core.


Got you, yeah I misread you commend the first time around


Note that 404 < 512




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: