When you say cold cache, did you clear the os page cache as well as the postgres buffercaches? After setup.sql, the cache will be warmish - I get 4ms on the first run. I'm using postgres 17.5
I did not explicitly evict the Postgres buffer cache, but using pg_buffercache to evict all buffers for the table, yields a runtime of 23ms for me (still going for the BitmapOr).
Have you looked into speaking with the various SHPOs in each US State/Territory?
I've worked with several of them a fair bit and they have a ton of old maps hidden internally. Especially for small, specific areas of the state, like historical districts.
I think having a way to build statistics on the join itself would be helpful for this. Similar to how extended statistics^1 can help when column distributions aren't independent of each other.
But this may require some basic materialized views, which postgres doesn't really have.
As somebody who implemented manual incremental materialized tables using triggers, yeah it's pretty dang hard to make sure you get all the edge cases in which the data can mutate.
This website taught me a ton, even after I thought I knew more than enough about performance. Just seeing how different databases generate and execute their SQL is a huge boon (and sometimes extremely surprising when looking at one DBMS to another).
For sure, there's definitely a lot of cool techniques (and I'm not aware of all of them)! And the first example is very much contrived to show a small example.
I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?
> I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?
Different databases will use similar terms for different operations, but I would guess that the comment refers to something similar to MySQL's index merge (which is essentially reading the row IDs of all the relevant ranges, then deduplicating them, then doing the final scan; it's similar to but less flexible than Postgres' BitmapOr).
Absolutely. Though I don't recall seeing multiple sequential scans without a self-join or subquery. A basic filter within a sequential scan/loop is the most naive/simplest way of performing queries like these, so postgres falls back to that.
Also, fwiw, BitmapOr is only used with indexes: https://pganalyze.com/docs/explain/other-nodes/bitmap-or.
That was the extreme case - the multi-scan would be gotten if a casual reader tried your (neat by all means) AND-only query on a non-indexed table (or partially indexed for that matter).
Gotcha, I misunderstood your comment. The multiple counts is a definitely very contrived example to demonstrate the overhead of BitmapOr and general risk of sequential scans.
See https://github.com/ethan-seal/ors_expensive/blob/main/benchm... where I use dd to clear the os page cache.
This article by pganalyze talks about it: https://pganalyze.com/blog/5mins-postgres-17-pg-buffercache-...