ethanseal's comments

ethanseal · 2025-09-30T14:05:01 1759241101

When you say cold cache, did you clear the os page cache as well as the postgres buffercaches? After setup.sql, the cache will be warmish - I get 4ms on the first run. I'm using postgres 17.5

See https://github.com/ethan-seal/ors_expensive/blob/main/benchm... where I use dd to clear the os page cache.

This article by pganalyze talks about it: https://pganalyze.com/blog/5mins-postgres-17-pg-buffercache-...

hans_castorp · 2025-09-30T14:41:44 1759243304

I did not explicitly evict the Postgres buffer cache, but using pg_buffercache to evict all buffers for the table, yields a runtime of 23ms for me (still going for the BitmapOr).

https://notebin.de/?ac3fcf55e6850f47#ERXndRrqp3X4zEWX5EC3dZU...

Which plan does Postgres choose in your case that results 100ms?

ethanseal · 2025-09-30T14:56:54 1759244214

Exactly the same one from what I see: https://github.com/ethan-seal/ors_expensive/blob/main/explai...

Given the buffer reads seem close to yours, I believe it's page cache.

ethanseal · 2025-09-30T03:19:30 1759202370

GIS is underrated. This is awesome!

Have you looked into speaking with the various SHPOs in each US State/Territory?

I've worked with several of them a fair bit and they have a ton of old maps hidden internally. Especially for small, specific areas of the state, like historical districts.

ethanseal · 2025-09-29T22:29:14 1759184954

I think having a way to build statistics on the join itself would be helpful for this. Similar to how extended statistics^1 can help when column distributions aren't independent of each other.

But this may require some basic materialized views, which postgres doesn't really have.

[1]: https://www.postgresql.org/docs/current/planner-stats.html#P...

johnthescott · 2025-09-30T02:14:51 1759198491

could you elaborate on pg not really having matviews?

ethanseal · 2025-09-30T03:34:43 1759203283

Materialized views in Postgres don't update incrementally as the data in the relevant tables updates.^1

In order to keep it up to date, the developer has to tell postgres to refresh the data and postgres will do all the work from scratch.

Incremental Materialized views are _hard_. This^2 article goes through how Materialize does it.

MSSQL does it really well from what I understand. They only have a few restrictions, though I've never used a MSSQL materialized view in production.^3

[1]: https://www.postgresql.org/docs/current/rules-materializedvi... [2]: https://www.scattered-thoughts.net/writing/materialize-decor... [3]: https://learn.microsoft.com/en-us/sql/t-sql/statements/creat...

timbowhite · 2025-10-02T14:30:57 1759415457

The pg_ivm plugin adds incremental updates to Postgresql materialized views:

https://github.com/sraoss/pg_ivm

Though I don't know how well it works on a write-heavy production db.

Tostino · 2025-09-30T21:27:16 1759267636

As somebody who implemented manual incremental materialized tables using triggers, yeah it's pretty dang hard to make sure you get all the edge cases in which the data can mutate.

ethanseal · 2025-09-29T22:24:17 1759184657

Highly recommend https://use-the-index-luke.com/

It's very readable - I always ask new hires and interns to read it.

progmetaldev · 2025-09-30T03:05:30 1759201530

This website taught me a ton, even after I thought I knew more than enough about performance. Just seeing how different databases generate and execute their SQL is a huge boon (and sometimes extremely surprising when looking at one DBMS to another).

ethanseal · 2025-09-29T22:01:15 1759183275

For sure, there's definitely a lot of cool techniques (and I'm not aware of all of them)! And the first example is very much contrived to show a small example.

I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?

Is there another optimization I'm missing?

The article links to my code for my timings here: https://github.com/ethan-seal/ors_expensive

There is an optimization that went in the new release of PostgreSQL I'm excited about that may affect this - I'm not sure. See https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

Sesse__ · 2025-09-29T22:10:03 1759183803

> I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?

Different databases will use similar terms for different operations, but I would guess that the comment refers to something similar to MySQL's index merge (which is essentially reading the row IDs of all the relevant ranges, then deduplicating them, then doing the final scan; it's similar to but less flexible than Postgres' BitmapOr).

ethanseal · 2025-09-29T22:21:47 1759184507

Cool. I'll have to read up on that.

ethanseal · 2025-09-29T21:15:57 1759180557

Absolutely. Though I don't recall seeing multiple sequential scans without a self-join or subquery. A basic filter within a sequential scan/loop is the most naive/simplest way of performing queries like these, so postgres falls back to that. Also, fwiw, BitmapOr is only used with indexes: https://pganalyze.com/docs/explain/other-nodes/bitmap-or.

jalk · 2025-09-29T21:32:32 1759181552

That was the extreme case - the multi-scan would be gotten if a casual reader tried your (neat by all means) AND-only query on a non-indexed table (or partially indexed for that matter).

ethanseal · 2025-09-29T21:41:27 1759182087

Gotcha, I misunderstood your comment. The multiple counts is a definitely very contrived example to demonstrate the overhead of BitmapOr and general risk of sequential scans.