Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not just a marketing number, its a main indicator of model size and memory usage. Some of what is happening now is trying to see how 'large' the LLMs need to be to function at certain level, for instance it was claimed Llama (65B) had GPT-3 (175B) level performance but at 65B parameters that is a lot less memory usage. It's rough high level indicator of the computational requirements to run the model.


Without accounting for data and model architecture, it’s not a very useful number. For all we know, they may have sparse approximations which would throw this off by a lot. For example, if you measure a fully connected model over images of size N^2 and compare it to a convolutional one, the former would have O(N^4) parameters and the latter would have O(K^2) parameters, for K<N window size. It’s only useful if you know they essentially stacked additional layers on top of GPT3.5, which we know is not the case as they added a vision head.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: