Hacker News new | past | comments | ask | show | jobs | submit login

Perhaps they are more easily compressible? Once a bunch of nearby weights have similar roles one may not need all of them.



Yep. That is exactly the idea here. Our compression method is super duper naive. We literally keep every n-th weight column and discard the rest. Turns out that even after getting rid of 80% of the weight columns in this way, we were able to retain the same performance in a 125M GPT.


If you have things organized neatly together, you can also use pre-existing compression algorithms, like JPEG, to compress your data. That's what we're doing in Self-Organizing Gaussians [0]. There we take an unorganised (noisy) set of primitives that have 59 attributes and sort them into 59 2D grids which are locally smooth. Then we use off-the-shelf image formats to store the attributes. It's an incredibly effective compression scheme, and quite simple.

[0]: https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: