Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

MySQL, Postgres etc. all support transparent compression. I'd be curious how small the database would end up after compression, and what the impact would be on querying time.

I'm skeptical it would be as good as the parquet/sqlite option the author came up with (postgres I believe does compression value-by-value, can't remember how MySQL does it).




I can't speak for MySQL, but I suspect the Postgres compression you're referring to is TOAST (https://www.postgresql.org/docs/current/static/storage-toast...).

Its sweet spot is for much larger rows. In fact, it only kicks in for rows whose content is larger than the page size (2KB or so), so it doesn't trigger for this case, where the average row size is about 120 bytes (and only 80 bytes of that is content).

I bet you could build the DB, stop Postgres, move its data dir to a squashfs filesystem, and then start Postgres in read-only mode for a huge space savings with minimal query cost, though.

Hmm, in fact, it'd be easy to do that with the SQLite DB since it's just a single file. I might give that a shot.


Squashing the SQLite file works pretty well -- it's a bit bigger and slower than Parquet, but maybe a reasonable trade off to not have to deal with Parquet.

I added a section at https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html#s... to mention it. Thanks for the inspiration!


On the MySQL side, https://dev.mysql.com/doc/refman/8.0/en/innodb-compression-b....

It reads like row level compression + index compression? They claim indexes make up a fair chunk of the disk usage, so there may be advantages.


For analytics/OLAP you can use ZFS compression with a large block size, zstd support is just around the corner too. I would still use compression for a OLTP database, but with much lower block size, max 16kb.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: