Optimal number of tokens for 7B parameters is around 140B tokens[0], and meta trained it for trillion tokens.
[0]: https://arxiv.org/pdf/2203.15556.pdf
Optimal number of tokens for 7B parameters is around 140B tokens[0], and meta trained it for trillion tokens.
[0]: https://arxiv.org/pdf/2203.15556.pdf