Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLAMA made tradeoff for reducing parameter budget instead of training computation budget. This is better for inference computation budget.

Optimal number of tokens for 7B parameters is around 140B tokens[0], and meta trained it for trillion tokens.

[0]: https://arxiv.org/pdf/2203.15556.pdf



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: