From the paper :
> We train using the AdamW [26] optimizer with a batch size of 5 and gradient accumulation over 20 steps on a single NVIDIA A100 GPU
So it's "consumer-grade" because it's available to anyone, not just businesses.
Found on Yi-Zhe Song's Linkedin :
> Runs on a single NVIDIA 4090
https://www.linkedin.com/feed/update/urn:li:activity:7270141...
From the paper :
> We train using the AdamW [26] optimizer with a batch size of 5 and gradient accumulation over 20 steps on a single NVIDIA A100 GPU
So it's "consumer-grade" because it's available to anyone, not just businesses.