Hacker News new | past | comments | ask | show | jobs | submit login

Yea, I think pg_attribute is likely to be the main issue here. For one, it obviously exists many times per table, and there are workloads with a lot of tables. But also importantly it's included in all tuple descriptors, which in turn get created during query execution in a fair number of places. It's currently ~140 bytes, with ~64bytes of that being the column name - just doubling that would increase the overhead noticeably, and we already have plenty of complaints about pg_attribute. I think it'd be fairly useless to just choose another fixed size, we really ought to make it variable length.



Is it ~140 bytes? pahole says it's 112 (without CATALOG_VARLEN).

The impact of doubling NameData size would be quite a bit worse, though, thanks to doubling of chunk-size in allocset. At the moment it fits into a 128B chunk (so just ~16B wasted), but by doubling NameData to 128B the struct would suddenly be 176B, which requires 256B chunk (so 80B wasted). Yuck.


> Is it ~140 bytes? pahole says it's 112 (without CATALOG_VARLEN).

Well, but on-disk varlena data is included. pg_column_size() averages 144 bytes for pg_attribute on my system.

> The impact of doubling NameData size would be quite a bit worse, though, thanks to doubling of chunk-size in allocset. At the moment it fits into a 128B chunk (so just ~16B wasted), but by doubling NameData to 128B the struct would suddenly be 176B, which requires 256B chunk (so 80B wasted). Yuck.

I'm not sure that actually matters that much. Most attributes are allocated as part of TupleDescData, but that allocates all attributes together.


> Well, but on-disk varlena data is included. pg_column_size() averages 144 bytes for pg_attribute on my system.

Sure, but I thought we're talking about in-memory stuff as you've been talking about tuple descriptors. I don't think the on-disk size matters all that much, TBH, it's likely just a tiny fraction of data stored in the cluster.

> I'm not sure that actually matters that much. Most attributes are allocated as part of TupleDescData, but that allocates all attributes together.

Ah. Good point.


> I don't think the on-disk size matters all that much, TBH, it's likely just a tiny fraction of data stored in the cluster.

I've seen pg_attribute take up very significant fractions of the database numerous times, so I do think the on-disk size can matter. And there's plenty places, e.g. catcache, where we store the full on-disk tuple (rather than just the fixed-length prefix); so the on-disk size is actually quite relevant for the in-memory bit too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: