Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wikipedia itself describes its size as ~25GB without media [0]. And it's probably more accurate and with broader coverage in multiple languages compared to the LLM downloaded by the GP.

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia



Really? I'd assume that an LLM would deduplicate Wikipedia into something much smaller than 25GB. That's its only job.


> That's its only job.

The vast, vast majority of LLM knowledge is not found in Wikipedia. It is definitely not its only job.


When trained on next word prediction with the standard loss function, by definition it is it's only job.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: