Hacker News new | past | comments | ask | show | jobs | submit login

One Nvidia A100.

From the paper :

> We train using the AdamW [26] optimizer with a batch size of 5 and gradient accumulation over 20 steps on a single NVIDIA A100 GPU

So it's "consumer-grade" because it's available to anyone, not just businesses.




That is the training gpu… the inference gpu can be much smaller.


I stand corrected.

Found on Yi-Zhe Song's Linkedin :

> Runs on a single NVIDIA 4090

https://www.linkedin.com/feed/update/urn:li:activity:7270141...


Thanks!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: