Pgagroal: High-performance connection pool for PostgreSQL

maxpert · on Nov 22, 2021

> This report describe pgagroal in relationship to 3 other PostgreSQL connection pool implementations, which we will call "a", "b" and "c".

Why? Just why! Please put up names rather than saving faces for other poolers!

bigbillheck · on Nov 22, 2021

> Why? Just why! Please put up names rather than saving faces for other poolers!

It's Larry Ellison's fault: https://en.wikipedia.org/wiki/David_DeWitt

oliwarner · on Nov 22, 2021

This is an [interesting] joke, right?

bigbillheck · on Nov 22, 2021

No? Oracle started putting clauses in their sales contracts limiting publishing benchmarks because they performed poorly in a published report.

oliwarner · on Nov 22, 2021

I mean in this context. Seems implausible you could sneak a DeWitt Clause into a FOSS license.

iruoy · on Nov 22, 2021

Also make the benchmark code open source, so that we can run it on our own hardware and judge it's relevancy.

hagbarddenstore · on Nov 22, 2021

Uh... https://www.postgresql.org/docs/current/pgbench.html ?

That's what's being used...

anarazel · on Nov 22, 2021

I do not get why it's a good idea for a connection pooler to go for a process model. The whole "isolation and stability" argument is doubtful for postgres itself, but it's imo pretty much nonsensical for a pooler.

paulryanrogers · on Nov 23, 2021

Could be simplicity and ease of implementation. I recall Pgpool was also separate processes, and relied on resetting open connections between sessions.

skrause · on Nov 22, 2021

"PostgreSQL's model of one process per connection doesn't scale, so let's build a connection pooler that uses one process per connection."

Certainly sounds like a strange concept. How well does it scale with thousands of connections?

jpgvm · on Nov 22, 2021

Not the author but PG connection scalability being blamed on per-process model is somewhat misunderstood. You can easily have many hundreds of thousands of processes, they are mostly equivalent to threads on Linux anyway (with some minor differences ofc).

Instead the poor scalability stems from the amount of resources allocated to each connection. If postgres was to switch to threads without changing this architecture it's likely scalability wouldn't be majorly improved.

By the same token a connection pooler that is fork based but only allocates a very small amount of resources per connection is still likely to a net win as it can do statement based pooling of a smaller number of postgres connections.

kasey_junk · on Nov 22, 2021

Pretty good discussion of Postgres’ process bottlenecks: https://www.citusdata.com/blog/2020/10/08/analyzing-connecti...

Summary: The resource per process issue is isolated to just the transaction isolation mechanism and can likely be improved but you are never getting around the context switching and memory locality issues of the per process model.

anarazel · on Nov 22, 2021

> You can easily have many hundreds of thousands of processes, they are mostly equivalent to threads on Linux anyway (with some minor differences ofc).

Not really. The page-table overhead of a 100k processes alone is going to a eat up a lot of memory - and you don't have that with threads. And with something like postgres the page tables aren't going to be tiny, even with explicit use of huge pages for applicable allocations. The decrease in TLB hit ratio also is very painful. In a lot of workloads postgres spends way too much time in TLB faults, and that's to a good degree caused by the process model (also caused by linux making it way too hard to map executables with huge pages, itlb faults are a significant issue).

That's not to say that a 1 connection: 1 thread approach is the right thing for a pooler - threads area also don't scale great. I think there are good simplicity arguments to be made for a 1 connection: 1 thread model for a database server itself, but for poolers the benefits seem much smaller.

anarazel · on Nov 22, 2021

(Although the sibling post pointing to my postgres connection scalability post is right - processes are not the major bottleneck at the moment, and certainly not with < PG 14)

jpgvm · on Nov 22, 2021

Yeah I agree wholeheartedly. What I was trying to get across was that simply s/processes/threads isn't going to make a massive difference to postgres if that was the only change you were to make.

Your analysis seems pretty on-point with what I expected, i.e snapshot scalability and other postgres specific arch being the major bottlenecks for higher connection counts.

anarazel · on Nov 22, 2021

FWIW, a good chunk of the snapshot scalability changes are in PG 14, so the situation has gone a bit further towards s/processes/threads/ being the primary bottleneck. It depends quite a bit on whether you're write-mostly (bottlenecks almost completely unrelated to processes vs threads) or read-mostly. And in the latter case whether it's a context switch heavy workload (processes vs threads may be a significant factor), or not (processes vs threads can be a significant factor, but less likely).

api · on Nov 22, 2021

The popularity and mindshare of Postgres really puzzles me. It’s so arcane and full of weird failure modes and gotchas. It feels like something from a 1980s mainframe that would come with a bound “field manual.”

We can have file systems like zfs that for all but maybe the largest and most exotic use cases are bulletproof and relatively easy to administrate. Why can’t we have SQL like this?

I guess we do in the form of CockroachDB or Yugabyte, but Postgres has been so aggressively evangelized try selling someone on those. I know of one or two companies that use them but it’s a tough sell.

There are tons of overpriced Postgres managed cloud offerings and consultancies, so my guess is that there is a heavy financially motivated astroturf campaign.

rkrzr · on Nov 22, 2021

As a long-time happy user of Postgres, I am happy to answer your questions:

> The popularity and mindshare of Postgres really puzzles me.

Why? Postgres really is "The World's Most Advanced Open Source Relational Database", just like it says on the tin.

> It’s so arcane and full of weird failure modes and gotchas.

Not really. It actually is the best-documented open source database that behaves just like they say in the manual. The Postgres manual is really quite wonderful, can be read back-to-back, like a good book.

> Why can’t we have SQL like this?

This has more to do with SQL itself than with Postgres. SQL as a standard has evolved over so many decades that the latest version of the spec has literally thousands of pages [citation needed, couldn't quickly find a link]. Implementing all of this, is an enormous task, and no relational DB implements all of it.

> I guess we do in the form of CockroachDB or Yugabyte, but Postgres has been so aggressively evangelized try selling someone on those. I know of one or two companies that use them but it’s a tough sell.

Postgres covers ~95% of all use cases of your average software company. Specialized DBs are really only useful in that last ~5% of cases.

> There are tons of overpriced Postgres managed cloud offerings and consultancies, so my guess is that there is a heavy financially motivated astroturf campaign.

Postgres is quite easy to run yourself, completely free of charge. There really isn't any "financially motivated astroturf campaign" that I can discern.

djbusby · on Nov 22, 2021

I've been using PG for like 20 years. Never paid, except for odd donation here and there, last trigger by the add of jsonb type.

Scarbutt · on Nov 22, 2021

This forum is heavily anti-other-dbs and pro-postgres so you won't get objective answers here. FWIW, at least three DBAs that know both well have told me that postgres is an operational nightmare compared to mysql.

endisneigh · on Nov 22, 2021

In which ways?

Scarbutt · on Nov 22, 2021

Don't know the exact details but they mentioned is full of operational footguns, like using it slightly wrong can utterly tank your performance or availability. Many operations are unpredictable so is hard to know what to expect.

Another reason they mentioned is InnoDB, in particular that is the most battle-tested and well-engineered storage engine. With enormous amounts of engineering effort in hardening it for performance and for stability and operability at scale by companies like Google and Facebook.

They also like Vitess.

paulryanrogers · on Nov 23, 2021

Pg transactions seem more likely to deadlock compared to MySQL, IME. Its replication story was also worse for many years, though it's catching up.

Pg connection limits are also lower because of resource allocation. And lack of query hints can really hurt when Pg's planner goes sideways.

That said InnoDB's online DDL lagged Pg for quite a while. Generally feels like they're both getting better and more alike all the time.

gmac · on Nov 22, 2021

First: have you used Postgres, and at what scale? Because that hasn't been my experience of it.

Second: do you know something that's comparable but better?

api · on Nov 22, 2021

I should have been more detailed.

Postgres works really well in single node configurations as long as you read up a bit on the weirdness around vacuuming. Usually you don't have to worry unless you are running write-heavy loads at large scale.

HA Postgres is where things get frustrating and hairy. It's 2021. I should be able to set up a master-master multi-node database by running a second node and telling it where the first node is, and then you're done. Master with read-only backup should be similarly easy. If I shut down the "lead" node a backup should become leader automatically. I should be able to set this up in a day and not worry that it will fail to fail over properly in a real scenario because there are algorithms now like Raft consensus that are provably correct (provided you meet their requirements).

IMHO this kind of easy clustering is a table stakes feature for anything billing itself as an "enterprise capable" database.

The fact that HA Postgres requires me to think in 2021 is what makes me hate the thing. As near as I can tell this problem exists to make me buy managed HA Postgres at 1000% markup over compute/storage.

I can't complain too much because it's free and it is very good for the things it does well. It's just puzzling to me that there is no canonical turn-key solution for this after so many years of so many people requiring it.

Implementing Raft is far far easier than a lot of the other stuff that has been implemented in Postgres.

whitepoplar · on Nov 22, 2021

Check out pg_auto_failover! It's the easiest HA option for Postgres atm.

paulryanrogers · on Nov 23, 2021

What other DBs deliver single day, enterprise ready deployments with HA multi master?

Do they include ACID and CAP tradeoffs similar to PostgreSQL?

Is their query language as rich and transferable as SQL?

api · on Nov 23, 2021

CockroachDB is pretty close. I wouldn't say I'd go fully into production in one day but the failover behavior is Raft based and easy to understand. I could prototype an enterprise cluster in a few hours without descending into the arcane bowels of secret pg_incantations and "I think this is set up right but not sure if it will actually recover" clunky shit.

bayesian_horse · on Nov 22, 2021

Some people feel like Postgres with its decades of history is the safer bet. It's a known commodity. Or at least more known than CockroachDB and Yugabyte.

qaq · on Nov 22, 2021

You do realize that prob 80% of Yugobyte code is PostgreSQL?

api · on Nov 22, 2021

Yes. The other 20% implements Raft consensus.

garyclarke27 · on Nov 22, 2021

I'm curious, does anyone know how this compares to Yandex Odyssey or Pgpool?

andruby · on Nov 22, 2021

I'd love to read a comparison with PgBouncer (I assume that PgBouncer is still the dominant connection pool)

toredash · on Nov 22, 2021

AFAIK, it is.

fowl2 · on Nov 23, 2021

Always surprised to hear that Postgres hasn't grown at least a basic in-tree connection pool.

pcardoso · on Nov 22, 2021

> Pronounced: p-g-a-gro-al, named after Agroal in Portugal.

Interesting!

hoseja · on Nov 22, 2021

Sorry but that's clearly p-gag-roll, whatever the creators may wish.