My findings were a bit controversial at the time as I found that uWSGI + sync Python is about the best option, largely because replacing interpreted Python code with native code is a very significant factor in the performance of a Python program.
In the following discussion (and private emails) I was stunned by the number of async enthusiasts who proposed raising the number of database connections in the connection pool by two or three orders of magnitude (I had 20 conns in a pool for a 4 cpu machine) for reasons along the lines of "the async app is starved of connections".
In my opinion that suggestion betrays a misunderstanding of what is likely going on in the database when you have 100s or 1000s of connections all contending for the same data: in SQL even reads create locks. Async applications yield a lot, by design, and under hundreds or thousands of requests per second there is a considerable buildup of "work in progress" in the form of async tasks that were yielded from and which have not yet been returned to and completed. Many hundreds of database connections is going to create an enormous volume of bookkeeping on the database side and is very likely to slow things down to a absolute crawl.
Even idle connections are known to have detrimental effect in postgres. Someone at Microsoft is apparently working on this and released this great blog post quantifying the problem:
> Even idle connections are known to have detrimental effect in postgres.
I have a personal backlog task to evaluate whether connection multiplexing benefits the idle connection cost issue. I wish it could get more priority, as I'm super curious, but my load/volume are so low for my work that it just doesn't matter yet.
Idle connections did show up as taking substantial db CPU time on one MySQL (Rails/Ruby) cluster I administered. We had thousands of connections each making a 'ping' request to check the validity of the connection every second. That was enough context switching load to have an noticeable load.
MySQM? MySQL presumably? Anyway MVCC is implemented via locks (among other things, like copying, transaction ids, visibility maps, etc).
Regardless of that, my main point is that even reading requires transactional bookkeeping in the database - which some people don't realise. If you read a row and, do nothing and then rollback a few ms later (common in webapps) there is still bookkeeping to be done.
I wasn't talking about writes but as an aside: I wouldn't bet on that.
SQL databases are complicated and writes can happen for many reasons - if you manage to bloat the working set by reading more stuff concurrently (old rows, etc) it's not hard to imagine something having to be paged out as a consequence.
Innodb does not create locks for reads, unless in a transaction.
> SELECT ... FROM is a consistent read, reading a snapshot of the database and setting no locks unless the transaction isolation level is set to SERIALIZABLE. For SERIALIZABLE level, the search sets shared next-key locks on the index records it encounters. However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.
I’ve worked with many libraries and frameworks and I haven’t seen transactions by default. Given how easy it is to deadlock yourself this way, for instance by naively making batch reads in random order. I doubt many libraries would make transactions the default.
For instance in rails and django you need to explicitly specify a transaction. Can you give an example of a framework that turns off auto commit by default and instead runs it at the end of http requests?
MSSQL will lock on Reads by default, and can(will) escalate locks to entire pages for reads.
Making MVCC Mode 'Opt In' requires enabling Snapshot Isolation, and making it default requires another change. It's also worth noting the way MSSQL does MVCC is far more TempDB/Memory hungry than Oracle/Postgres (due to the nature of using snapshots). I've been in more than one shop that did not want to enable it due to the memory requirements.
When I was working at a large online travel website a few decades ago, we had a few hundred front-end boxes each with a connection pool set to (I think) around 10 each - so around 2000 connections on the DB backend. We did some profiling and discovered that, under the heaviest load, we never had more than one connection active per web server (and usually 0), so a single shared DB connection would have been just as effective as the pool was.
Hm - it's been almost 20 years, so my memory is a bit hazy, but I seem to recall that there were (relatively infrequent) writes. But definitely no transactions: just grab a connection, insert/update some data, return it to the pool.
I think latency also matters a lot for client-side DB connection pooling. If the DB is a 10ms round-trip away, and queries take 10ms, the process on the DB side is only really busy half the time.
Doesn't all of this become mindbogglingly complex once you factor in a) changing replica counts (connection count changes in steps of up to the max pool size) and b) multiple applications using the same DB hardware?
Ish but not really. Once you realize your DB is all about resource constraints and connections aren't free then the optimization becomes pretty simple. Use as few connections as possible to get whatever work you need done done.
Once you view connections as a net cost to the system the math becomes simple. The network isn't infinite, DB hardware isn't infinite. At the end of the day, more connections == more load on the system as a whole due to management.
This will also lead you towards good DB management. If you want to be able to scale and respond to things, then the right thing to do is keep you Datasets as small as possible and stop multiple apps from using the same DB hardware. Use a fleet of dbs rather than one big db and you'll have a lot better time of scaling.
I say this as someone that currently works at a company where we have everything on a small set of big DB hardware. We are currently running into all the problems of hardware and network constraints that are causing us major issues.
This article helped me to reduce connections pool from 80 to 10 per box. Which helped to serve traffic spikes from couple requests per second to thousands per second.
I think that's just arithmetic. The fundamental idea is that you have enough connections that you can guarantee that at least one thread has all the connections it needs - because if it does, it can proceed, finish, release its connections, and then another thread can pick them up and gets to proceed, etc.
If a thread needs Cm connections, then Cm - 1 connections is enough to be one thread short of what you need. If you have Tn threads, then Tn * (Cm - 1) connections lets every thread, in the worst case, be one short of what it needs (if the connections are not evenly distributed, then at least one thread already has as many as it needs). So Tn * (Cm - 1) + 1 connections means that at least one thread definitely has enough connections.
Ah, ok. Seems clear that this is for a thread requiring C connections simultaneously to get its work done. That would make it a necessary calculation to avoid deadlocks. I don't see this being necessary in a Rails type app where there's usually one connection per request, but if one were using Go, setting a max connections parameter, and then simultaneously checking out more than one connections to do work, there's an easy chance of deadlocks.
Will add a section talking about that and this formula.
> Seems clear that this is for a thread requiring C connections simultaneously to get its work done.
Exactly. As that document says (my emphasis):
> The prospect of "pool-locking" has been raised with respect to single actors that acquire many connections. This is largely an application-level issue. Yes, increasing the pool size can alleviate lockups in these scenarios, but we would urge you to examine first what can be done at the application level before enlarging the pool.
> Where Tn is the maximum number of threads, and Cm is the maximum number of simultaneous connections held by a single thread.
To be honest, i think applications do this are broken ("largely an application-level issue"), so i'm not sure how important it is.
One thing I never understand about connection pool is how do they handle server-sent close connection (FIN) events? Like I can guarantee there are tons of CLOSE_WAIT connections on my near-idle Python process for various of db pools, mysql or redis or pg.
For sqlalchemy it just complaints "MySQL server has gone away (error(32, 'Broken pipe'))" and never know how to recover. For redis it just 100% the CPU and keep epoll the dead TCP.
I'm not familiar with specific implementations, but from what I've seen with Redis and Postgres clients there's always a reconnect hook of some sort, which I assume pooling proxies give you as well. We've had plenty of broken pipe errors and we just reconnect.
> Generally your client should close the connection when it's done. Server-side close should mostly be related to idle timeouts.
To achieve that, the client must have an extra thread listening for the close event, or a periodic check to purge the dead TCP connections.
But there's a gotcha, some TCP connections without KEEPALIVE just timeout naturally, and FIN packets may never reach the client, you only know the connections is dead when you initiate a real request. This is when most connection pool fail apart.
Generally a connection pool also checks the health of its connections. In SQL connection pools this can be done by executing a select 1, if this fails then a new connection can be attempted.
Just 6 days ago you published what was a pretty great writeup on message queues, now this thorough dive on connections & pools; what's your writing schedule like? At some 12k words of fairly accurate and precise technical content I'm kind of blown away by your pace.
I had a few paragraphs written earlier (maybe half) over the course of the pandemic.
I spent a lot of the pandemic procrastinating by trying to build my own blogging software, a more efficient version of Ghost, but it’s too much duplicated effort given what I learned at AWS reinvent. So I decided to just set up and start publishing.
I write for about two hours a day, maybe four days or writing and two of editing for each article. I’d like to publish once a week, but don’t want to set that kind of goal because then I’ll write even if I have nothing interesting to say. But maybe that’s the best process anyway?
Will tell you what’s working if you ask again in a couple of months :-/
I was trying to make it multi tenant, move it a serverless database like DynamoDB, and rewrite the server in Go.
I’m trying to make a Ghost hosting service that’s free / $5 a month for personal use. I’ve figured out a way to make the economics work with the existing 500MB disk + 1GB RAM requirements, but it would have been a lot easier with Go and Dynamo.
Thanks a ton to you and the team for the work you’re doing. I think you’re making a fantastic system, and want to make it available to more people as a personal blogging platform. I just wish it could run multi tenant or with lower memory needs.
I don’t suppose making it easier for others to compete with the official Ghost revenue model is a priority, so I doubt this is anything you’ll want to take up. But I’m happy to share revenues on this - I just think Ghost should be a lot more accessible to individuals around the world.
Not clear how those changes would make it more accessible or what problem you're actually trying to solve?
There are lots of things we're working on improving but I promise you changing the programming language of any of those components wouldn't make any difference to the end experience of the software.
A lot of blogging platforms (including Ghost) start out at $5/mo targeting personal use -- but the ones that survive never stay there. Probably some worthwhile takeaways from that :)
Yeah, the Go decision was more because of familiarity and some tooling I have available for Dynamo. The real problem I was tackling is zero marginal cost multi tenancy. Right now a new Ghost installation costs 500MB on a disk and 1GB in RAM, and I would like to to cost one row in an existing database instead.
And yeah, don’t think the $5 / month is sustainable for a company, but for a solo founder it might work. Of course, that might be a classic case of this time it’s different.
Am I reading this correctly, that with NodeJS and PostgreSQL, the usage of multiple connection poolers _combined_ is a not-so-good idea?
Example, using Knex as the connection pooler for your node app, which is connected to your PgBouncer deployment?
I have not considered this a fault, in one way I find it compelling to have a layer of "protection" by using Knex in addition to PgBouncer, but not I'm not that sure anymore.
It depends on how knex is used. If it gives you a method to run a DB query, and internally checks out a connection, runs the query and releases the connection, then it’s already doing exactly what would happen with a proxy.
If it or you are setting up middleware to start a transaction before every request, or put a checked out connection in the request context, that’s a lot of possible inefficiency. If you really want the convenience this offers, and have request flows that do a lot of other non-DB work, you’ll need a pooling proxy.
If your request handling is very database heavy and does nothing else, then automatic request level connection check out is still efficient, even without a proxy. Even if you have a proxy one real connection is going to be working the entire duration of the request anyway.
We have a Rails app that uses Puma with MySQL, but so far single threaded (the Puma config has `threads 1,1`), because we migrated from Unicorn and are still worrying about thread-safety.
Currently our database.yml pool size is set to 25, because we have 25 puma workers on the server.
Reading your article, this seems to be overkill, since that setting is _per worker_, and since each worker is running only with 1 thread, the default 5 should be more than enough?
Also, I just checked and our MySQL is configured with 151 max connections; if every worker of ours can use up to 25 connections, that can lead up to 625 connections to the database, and that's without considering Sidekiq (concurrency 25).
Re MySQL counts, based on your notes you’re probably using 50 conns at max whatever your settings are. 25 for sidekiq (assuming you do run a hot queue) and 25 at max for the servers. Irrespective of DB pool size, I think Rails does lazy connections, so you’re never going to go over 1.
I changed database.yml to use 10 connections. I didn't change the Sidekiq configuration, which still uses concurrency: 25.
Strangely, tonight I've seen the first exceptions like `ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use` in many, many years.
All of these exceptions came from within Sidekiq Workers, which is strange because, like I pointed, I didn't change the Sidekiq concurrency, only the database.yml pool, which shouldn't affect workers.
Isn’t sidekiq working off Redis? You might be adjusting the Redis concurrency instead. And if I remember correctly even with sidekiq you’re still using normal Rails and ActiveRecord, so a process that’s running 25 sidekiq jobs simultaneously is going to need 25 active record DB conns from the database.yml. I doubt the sidekiq config is overriding it in some way.
But basically if your sidekiq workers and app servers are using the same database.yml values, you’ll need to set the greater of the two on both. I assumed you have different settings on app servers and workers.
Well, even the default 5 would probably be overkill, unless you’re spawning threads on a request that each use their own DB connections. But yes, I think you can go with the the default. Once you’re confident about being threadsafe, the Heroku deployment guide for puma is pretty good for all the little knobs to turn.
> the popular servers for Python, PHP and other languages that use a global lock or assume single-threaded single-process execution
This is not true for python. The article you linked mentions Gunicorn, uWSGI and CherryPy, they are threaded and can handle multiple requests per process.
Even if Python does have this GIL thing, it does not lock network operations for a tcp/udp based database protocol (some C based modules do tend to lock the GIL)
Will edit that, yeah. I was am struggling to explain the GIL without having to talk about the GIL, because if open that can of worms it becomes a much longer article and goes off track.
All of the GL and event loop based stacks go multi threaded a the lower library levels, really. Will think about how to explain that. And will move Python off that general category.
I'm confused by this statement about pgBouncer et al.
These systems allow you to make as many database connections as you want and manage them inefficiently, because the connections they give you are fake & cheap simulated connections that they handle with low overhead.
What makes them inefficient then, if they're cheap and have low management overhead?
I think he meant that "you" can manage (or not manage them at all) them inefficiently because the proxy pool well take care of that for you. It reads awkwardly.
I've added a link to the project in the context of another comment here about deadlocks... is there anything in particular you'd like to mention? Will look into the pool in detail to see if there's any special tips and tricks they use.
Yeah, Rails doesn't currently strongly analyze what it needs to do. Based on my experience with it, I remember that it either 1) checks out a connection at the beginning of the request into a thread local / request context and releases it at the end of the request, or 2) checks out when the first active record work is attempted, and releases at the end of the request.
Either way, if you have a request flow of 1) DB read - 50ms 2) Network request - 200ms 3) DB write - 50ms, you can see that the connection is idle for 2/3 of your request handling time. Rails does this so that if you start a transaction in step 1 you can commit it in step 3. All your writes are also automatically wrapped in a transaction, I think.
In Go on the other hand, step 1 will checkout and release the connection immediately, and step 3 will checkout and release. So the connection isn't idle locked for the 200ms in step 2. But if you don't remember this explicitly and try in step 3 to finish something (transaction/advisory lock/prepared statement) that you started in step 1, you're screwed.
So Rails saved you from doom by holding onto a connection for an inefficient extra 200ms, and Go shoots you in the foot in the name of efficiency.
Have edited to "These systems allow you to make as many database connections as you want without worrying about management, because the connections..."
Not that many people may be using Oracle and PHP together, but last summer I decided to take a closer look at some performance aspects since I'm wanting to move certain PHP things to the cloud, while our Oracle database remains local, which would introduce some extra latency.
In some basic testing on my local machine at work just connecting to our local Oracle database introduced a good 60 ms - 100 ms for a request (before running any sort of queries) and due to PHP's nature this recurs on every new request.
I had searched for options for a long time in past months/years that would allow PHP to have some sort of connection pool the same way languages like Java/Go might allow, but had come up short each time.
I don't even recall how I came across it, but last July I finally learned about Oracle's Database Resident Connection Pooling feature (and then reminded my coworker about wanting to try it out in November) and was able to run some tests and it does a good job of eliminating most of that unnecessary connection waiting time for PHP applications so I was pretty happy with that testing and it didn't seem to increase the load much at all on the database server side.
Even though it might not be as ideal as a truly separate Connection Pool being managed at the application server level, or outside of the database, a Connection Pool available for use at the database level still has benefits so it would be kind of neat to maybe see more of this sort of approach be available with other vendors (if it isn't already).
EDIT: Forgot to add that as a result of the DRCP testing, even though moving the PHP side to the cloud introduces extra latency, with the time savings of the DRCP change, it helps reduce latency enough that overall it results in a time savings over the normal/conventional connection approach when the PHP code is local so for me that was a win (hopefully I'll get to try using it in production in the near future, but waiting on my colleague to sort out a few things with enabling it in production at the moment).
Great article, Suhir. Very well written and informative :DD
I would like to help point out some typos and missing words, and I made this PR[1] so you can easily see the diff. I hope this is useful, but also feel free to ignore these suggestions