Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My understanding is they are all still transformers. The tweaks are more about quantization that better to generalize over data more efficiently (so less parameters requires) and improvement of the training data/process itself.

Otherwise I'd like to know specifically whats better/improved between models themselves.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: