Hacker Newsnew | past | comments | ask | show | jobs | submit | ethanseal's commentslogin

When you say cold cache, did you clear the os page cache as well as the postgres buffercaches? After setup.sql, the cache will be warmish - I get 4ms on the first run. I'm using postgres 17.5

See https://github.com/ethan-seal/ors_expensive/blob/main/benchm... where I use dd to clear the os page cache.

This article by pganalyze talks about it: https://pganalyze.com/blog/5mins-postgres-17-pg-buffercache-...


I did not explicitly evict the Postgres buffer cache, but using pg_buffercache to evict all buffers for the table, yields a runtime of 23ms for me (still going for the BitmapOr).

https://notebin.de/?ac3fcf55e6850f47#ERXndRrqp3X4zEWX5EC3dZU...

Which plan does Postgres choose in your case that results 100ms?


Exactly the same one from what I see: https://github.com/ethan-seal/ors_expensive/blob/main/explai...

Given the buffer reads seem close to yours, I believe it's page cache.


GIS is underrated. This is awesome!

Have you looked into speaking with the various SHPOs in each US State/Territory?

I've worked with several of them a fair bit and they have a ton of old maps hidden internally. Especially for small, specific areas of the state, like historical districts.


I think having a way to build statistics on the join itself would be helpful for this. Similar to how extended statistics^1 can help when column distributions aren't independent of each other.

But this may require some basic materialized views, which postgres doesn't really have.

[1]: https://www.postgresql.org/docs/current/planner-stats.html#P...


could you elaborate on pg not really having matviews?


Materialized views in Postgres don't update incrementally as the data in the relevant tables updates.^1

In order to keep it up to date, the developer has to tell postgres to refresh the data and postgres will do all the work from scratch.

Incremental Materialized views are _hard_. This^2 article goes through how Materialize does it.

MSSQL does it really well from what I understand. They only have a few restrictions, though I've never used a MSSQL materialized view in production.^3

[1]: https://www.postgresql.org/docs/current/rules-materializedvi... [2]: https://www.scattered-thoughts.net/writing/materialize-decor... [3]: https://learn.microsoft.com/en-us/sql/t-sql/statements/creat...


The pg_ivm plugin adds incremental updates to Postgresql materialized views:

https://github.com/sraoss/pg_ivm

Though I don't know how well it works on a write-heavy production db.


As somebody who implemented manual incremental materialized tables using triggers, yeah it's pretty dang hard to make sure you get all the edge cases in which the data can mutate.


Highly recommend https://use-the-index-luke.com/

It's very readable - I always ask new hires and interns to read it.


This website taught me a ton, even after I thought I knew more than enough about performance. Just seeing how different databases generate and execute their SQL is a huge boon (and sometimes extremely surprising when looking at one DBMS to another).


For sure, there's definitely a lot of cool techniques (and I'm not aware of all of them)! And the first example is very much contrived to show a small example.

I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?

Is there another optimization I'm missing?

The article links to my code for my timings here: https://github.com/ethan-seal/ors_expensive

There is an optimization that went in the new release of PostgreSQL I'm excited about that may affect this - I'm not sure. See https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...


> I'm not super familiar with the term index merge - this seems to be the term for a BitmapOr/BitmapAnd?

Different databases will use similar terms for different operations, but I would guess that the comment refers to something similar to MySQL's index merge (which is essentially reading the row IDs of all the relevant ranges, then deduplicating them, then doing the final scan; it's similar to but less flexible than Postgres' BitmapOr).


Cool. I'll have to read up on that.


Absolutely. Though I don't recall seeing multiple sequential scans without a self-join or subquery. A basic filter within a sequential scan/loop is the most naive/simplest way of performing queries like these, so postgres falls back to that. Also, fwiw, BitmapOr is only used with indexes: https://pganalyze.com/docs/explain/other-nodes/bitmap-or.


That was the extreme case - the multi-scan would be gotten if a casual reader tried your (neat by all means) AND-only query on a non-indexed table (or partially indexed for that matter).


Gotcha, I misunderstood your comment. The multiple counts is a definitely very contrived example to demonstrate the overhead of BitmapOr and general risk of sequential scans.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: