I don't see anything of substance in those tweets.
It was slower with the fewer threads, and giving it more threads made it even slower. Celebrating this as awesome efficiency is..."interesting".
And saying that "being fast is not the goal" doesn't debunk the result or make its methodology flawed. Quite the contrary, it raises a good discussion about what those goals may be and clears up misconceptions, because apparently many people do believe it is faster (and for many async-style APIs those claims are either made or at least strongly hinted at and not disavowed. Looking at you, GCD/libdispatch).
> It was slower with the fewer threads, and giving it more threads made it even slower. Celebrating this as awesome efficiency is..."interesting".
With 1 to 4 process increase in 4 cores there's basically no granularity, at 3 there's not enough concurrency, and at 4 there's too much, choking the PostgreSQL process.
Whereas 1 to 16 blocking process increase allow for more granularity ramp up and leave the (almost) exact room for the PSQL process to do its thing before performance degrades.
That would be solved having the DB on a separate host to isolate this effect, and what happens in literally any non-dev environment.
And that's leaving asid the fact that 16 processes use roughly 4 times as much memory as 4, meaning async is cheaper to scale horizontally overall.
The error(?) in this benchmark is putting the DB on the same machine as the server.
Async solves the problem of not wanting to block other requests while you go out to expensive external resources. If you benchmark stuff while it's all on the same machine, it masks that cost.
I couldn't find the size of the DB. Is it 100GB of data? or more like 10MB? Doing one point query on 10MB might not touch even disk a whole lot.
If you were using srcreigh's hosted DB service and queries took 1s to run, the sync servers throughput would suffer greatly, meanwhile async servers would still perform pretty good.
If you're at the point where you've got a single application server with 100 waiting connections on a database, you've already blundered. Firstly, you're either overpaying for your database (connections aren't free) or you're underpaying for application servers. Fixing the problem in either dimension reduces the utility of your async server.
There's also this myth that Python code is free if the i/o wait is long enough. That's not true. 1). 100 queued connections on an async python server sucks so much momentum out your application. Especially in a resource constrained cloud environment. 2). Any server awaiting a database is going to transform the data to some response object which is going to wreck your overloaded async server. 3). As far as I know, all the async WSGIs are written in Python and will use WAY more CPU than the C-based sync WSGIs.
It does not matter if you describe a service as a database or anything else. The implications are the same. A service must make architectural decisions. Those decisions impose limits. And those limits are billed.
Regardless, HTTP servers are completely uninteresting on their own. No one makes a request to a web server to see how the server is feeling. The server's only function is to serve content. Content it must retrieve (i/o), store (memory), and transform (cpu). That store and transform step nukes your RPS and is completely unavoidable. And, in a high worker environment, will crash your system.
I'm kinda surprised given the huge amount of effort the person put into this study that the seemed to overlook the key point of async/await patterns: latency.
Also: how did we come to Python as a web server when there are languages and applications better suited? Do we really want one language to be everything to everyone?
Because people like Python and it does a decent job at it. Might not be your cup of tea for a variety of reasons, but a lot of people think it works well on the server. Is speed the concern? Spinning up a new server is pretty easy nowadays. Is it the lack of static typing? Aside from type hint support over the last few versions, I'd claim based on experience that PHP feels more loose with its types than Python is (mixed return results feel more common in PHP).
Python does a lot well for a lot of people. All that said, I think a lot of languages can do a lot of things well without serious performance impact. If you need the performance, choose the language based on that criteria.
I'm not denying Python is an amazing language. I have to use it for machine learning, and I've grown to really like it since the first time I used it in the 90's.
Just seems odd. Like, if you had a catering business and you drove a Ford Escort, and decided to add a bunch of trailers to it because it had too little capacity in the back seat. And then tried to optimizes those trailers because it was too inefficient, rather than just buying a delivery truck suited for delivering food. OK, that was an awkward analogy.
I'm not, I was being lighthearted. Sorry, bad joke I guess.
But to the other poster's point, some people use what is natural: PERL is so absolutely burned into my brain after 10 years writing CAD flows with it, that when I need to munge a file I've already written the code in my head before I've even opened my text editor.
> If you were using srcreigh's hosted DB service and queries took 1s to run, your sync server latency would be ~1s :) Or longer if they get queued up.
Yep, and your 16-worker gunicorn is going to serve 16 rps. What's shown here (nginx, pg, Python application running on the same low CPU VPS, one extremely fast db query per request) is not a "realistic benchmark".
Every request is expected to be completed in 10ms-20ms because of the nature of this benchmark test. In these cases it is unlikely you're going to beat the scheduling overhead to make async worth it.
However, if 10% of your requests take 10s, your 16 workers are very soon going to become 16 stuck workers and you won't be able to fulfill any new requests. This is the problem that async solves.
You shouldn't use a fire hose to water your plants but you also shouldn't use a sprinkler system to put out a fire.
Very much this. I run prod workloads with async python and indeed when you have some external I/O that can take a long time to complete - async fixes that.
One of my systems does around 8 api calls to service a request. It serves 80,000 daily users generating close to 2,000,000 expensive api calls and typically thats served by a single node running on a 2 core cpu. Gunicorn/starlette/fastapi.
The real problem with async in python is how easy it is to break it by introducing code or dependency that hogs the cpu every now and then. This usually means debugging weird timeouts that only happen every few days and are super hard to trace. Not sure I'd like to do async python again but it sure is efficient for I/O heavy workloads.
This should be obvious to most people agreeing with the article, and I came here to say what you just said. I use async in a webserver environment only, where it does shine, and I use Celery + Rabbit for synchronous tasks (executed asynchronously via webserver).
As long as I keep my async code on the webserver end, and my sync code elsewhere, the project seems to stay organized well. Otherwise, I tend to lose sight of proper naming, etc.
Overall I've been extremely unimpressed with Python's async story. They've spent a decade baking into the core of the language and the result is that it's no better than just using gevent or twisted or any of those other "old school" frameworks. Except now everyone is forced to support it because there are colored functions everywhere and you have to write library code that has to deal with all the myriad ways your stuff can be used depending on if you're in an event loop or not. It's baffling.
Async/await at the language level is a total scam. Give me Nginx+Lua and let me just write my regular procedural code and the underlying runtime will handle yielding/resuming for me.
Async await at the language level isn't a scam. Its an attempt to build an alternative to statemachines( large switch statements) and nested callbacks.
It just so happened that we were trying to break 100k connections on a single machine at the same time, which requires you to avoid thread switching. Something which an async executor is also doing.
Those two goals got combined into to "the great next thing that's better in every way" when in reality, 99% will never hit that bottleneck, and the people who write good async code could write it in either of the two other forms as well.
It has certainly been depressing, as a greenlet user over a decade ago, to watch people spend weeks rewriting codebases to add "await" in all the right places.
Most code seems to take sides: it’s either synchronous or not (which I suppose in itself isn’t great). For many good standard libraries/packages, there is now an async variant.
I used asyncio for the first time seriously recently and was pleasantly surprised. But, crucially, I didn’t want throughput, I wanted a low-cpu use app with easy-to-write concurrency.
I really hope GIL goes away and we can just use threads.
Async Python has proven faster in my uses for IO and non-CPU-related stuff. But I think Python, either as a community or within the language, needs to solve the anti-pattern of maintaining separate sync and async versions of a library. I'm thinking specifically of aioredis and redis-py, both of which I've worked on.
Some people are looking at ways to solve this. I know urllib3, elasticsearch-py, and a few others use unasync (https://github.com/python-trio/unasync) to transform async code into sync code, leaving one codebase supporting both uses in different namespaces. This leaves you with some conditional logic (is_async_mode() -- https://github.com/python-trio/hip/blob/master/src/ahip/util...). I'm seriously considering this approach.
So now we have sync code, that we've made into async code, that we now have to turn back into sync code! It's just workarounds upon workarounds upon workarounds. This is beyond the pale.
I really hope GIL goes away so that async tasks can be dispatched in many threads, like done in Go or Rust. Something like one thread per core, plus work stealing to distribute tasks in threads.
Dealing with threads directly is significantly worse. It's harder to do task cancellation, harder to do timeouts, you often need to reach for locks for the simplest things.
Honestly I have yet to see a language really nail async. For instance much attention has been given to async rust, but in my experience it is 100x more difficult to work with async than standard synchronous rust. So many gotchas, and the errors are extremely cryptic which is counter to what you expect in normal rust.
I found working with GCD was actually pretty good: of course it doesn't solve the safety concerns which rust aims to, but it's a sensible set of primitives which can be composed to achieve virtually whatever async problem you want.
I think there are two possibilities:
1. We haven't really landed on the right abstraction for async yet. Current approaches try to hide the wrong bits of complexity and this makes async difficult to reason about, or at least makes it a bit of a minefield of special cases and considerations.
2. Or, async programming is just complex, and there's never going to be a way to tie it up as a neat little consistent language feature.
> We haven't really landed on the right abstraction for async yet
I think there are two 'correct' abstractions for async. One is where you build it as a high level abstraction on top of other concurrency primitives. Elixir's Task.async* functions do this spectacularly, and even do things that you don't get natively from Erlang that make it very nice to work with (by giving you an opinionated way of keying into global state partitions).
The other "correct" async abstraction, imo, is a very thin, low-level abstraction over function frames and context switches. The zig programming language gets this right, IMO. I was able to write a feature that wraps zig's low-level async primitive in an event loop around the erlang's cooperative FFI scheduler, the rust equivalent hasn't figured that out yet, and since zig is colorless, you can use the same function in a non-async context, the yield points are just no-ops (which rust will never let you do, for reasons beyond just safety).
Honestly, I think Go's approach of syntactically light green threads is just better than async-as-keyword. There's a lot of other unergonomic stuff (like the verbosity of channel definition) that can make it a pain to use, but that's more Go-the-language's fault.
Everything a modern computer does has potential latency - whether that's waiting for the database to respond or waiting for L2 cache to respond. The programmer and the application needs decide when it's worth waiting around for the result, and when it's important to do other work in the meantime.
Go has concurrency, but I would hesitate to call goroutines an async abstraction (it is a `spawn` abstraction) because it lacks a built-in way of returning the retval of the called function (no await); you must implement that yourself with channels. Last I checked, which to be fair was about 5 years ago, go did not give you a generic way to do this with the standard library either, probably because it... Requires generics.
The equivalent of "async/await" in Go is a simple function call, not channels. As far as I understand, the main use of async/await in a web server is to work around blocking I/O calls. Node.js originally solved it with an explicit event loop/callbacks; C# added async/await which implements a state machine by bifurcating ("coloring") the entire API into sync and async functions (which, I think, is mistake); and in Go, I/O is asynchronous under the hood: the scheduler simply parks a goroutine in a blocking call and returns to it once it finishes, so your code looks like simple synchronous code but is actually asynchronous under the hood without any blocking. So there's simply no need for a special "await" keyword because all you need is a simple function call. If you want more complex scenarios (parallelize computation by spawning multiple goroutines, for example) -- that's when you need channels, but from my experience, it's rarely needed in the context of your typical web server. So, async/await is akin to cooperative multitasking where you have to explicitly tell the OS to yield, while in Go it's preemptive multitasking, which is done by the scheduler automatically and you don't even think about it. The downside is that it's not tunable like in C#, you have to trust the system's defaults.
As for generics, they're going to be added in the next release of Go, scheduled for February 2022.
> async/await at the language level is a total scam
This view will magically become mainstream as soon as "having worked with async" stops being a resume inflating selling point. I'm very glad Java is doing the right thing and hiding everything under the same "Thread" abstraction.
I think the forcing of Async into Python is largely a waste of effort. There's better languages and tooling for those jobs. Effective devs use the right tool for the right job. I've had to work on an async Python codebase the other day, and was surprised at how janky and cumbersome the syntax is (though I shouldn't be surprised). The whole setting up the event_loop, create_task, run_until_complete, etc. just feels pretty gross. I'd rather just implement something I knew to be performant in something like Go, C#, etc. a language that supports first class coroutine-style/message passsing support...
AFAIU async enables greater scalability by allowing computation to continue in other contexts while awaiting read/write operations - at the cost of slightly lower single context performance.
Yeah, I thought of this as a given. With the real benefit being that your fleet is able to handle a higher and not-hard-limited number of requests in flight, and the trade-off being the increased code complexity
Not sure about Python's async story, but in Node.js it is essential if your program's performance is I/O bound. There's no reason to let a thread stay idle while waiting for a database operation to finish.
I don't think you disagreed at all with rjbwork at all, it's just you're using words differently.
The article, and rjbwork, use slower to mean "slower when the CPU isn't regularly idling". In this context, Node.js async is slower as well, that's why the library better-sqlite3 for NodeJS is synchronous (because you're rarely going to be using sqlite in a context where the CPU/memory isn't very busy).
That doesn't make your code faster though. For a particular bit of single threaded computation the async approach is going to be slower due to context switching overhead. Same in C#, Rust, etc. If your performance is I/O bound, then, as I said, it enables greater scalability by allowing computation to continue on other contexts while waiting for that I/O operation.
I don't think anyone would argue that async code makes CPU-bound computations faster, the opposite is the case because you have naturally more overhead.
Synchronous: The thread will block until the I/O operation is done. No work is performed in this time.
Asynchronous: While waiting for the end of the I/O operation, the thread can continue to work, e.g. by handling other requests.
So, it's not a problem when the thread sleeps while waiting for the end of the I/O operation, but it's not optimal.
Sure, that one thread is blocked but it doesn’t take any CPU time and the rest of the threads on the system continue to run and handle requests? You can pretty easily run thousands of worker threads on a modern system so I don’t see the issue when one of your threads gets blocked on I/O?
The linux kernel has a pretty good handle on multitasking, blocking, and scheduling so I don’t see the desire to rebuild the wheel in userspace with coroutines to achieve what the kernel already does.
Careful, this is not Erlang. These are real threads, you won't be able to have thousands of them, more like 20 [1]. If most of them are idle, then you can maybe increase it to ~100. They come with a considerable overhead, so it's better to have one thread handle several I/O operations concurrently and in parallel, than to have one thread dedicated to only one I/O operation.
I understand what the article is saying but there’s still a something I’m missing. All I want (for my admittedly simple service) is to be able to send multiple api calls or database calls in parallel rather than one at a time to reduce overall endpoint latency.
I guess I should be using a normal synchronous framework that just calls asyncio.run within each api call?
At an instant messaging company we did this in 2010 with greenlet. In current_year, everything assumes you will use async, so you should probably use async.
I think the point of Async IO is not to go faster, but to be able to server more requests in a single process. This is only possible when request time is mostly spent waiting for I/O.
So if you have some server and you want to handle hundreds of thousands of concurrent requests, and each requests takes a few hundred milliseconds and most of this time is waiting for I/O (e.g, waiting for other micro services to reply), then using Async IO you could do this in one single box. Yo probably won't be able to do that with a typical balanced process pool such ash uwsgi or gunicorn.
tl;dr: Async (1) is not faster, (2) does not scale, and (3) is not stable under load.
Yes and no matter how many times its said people will still opt for the async option even though its slower in the general case. I went so far as to benchmark this myself in 2019 using EC2, digital ocean, and my local machine. I nearly published a blog post but I don't appreciate the attention that brings.
Gunicorn is not stable under heavy load. uWSGI delivers better throughput. If we talk about cloud environments exclusively, uWSGI is better full stop.
If you have endpoints with long awaits consider a strategy other than holding open the connection. If you need to run a socket server then run it separately from your api server.
The strangest part of this dataset to me was tests being run using Daphne with Starlette, instead of Django. Unclear who would reach for that combo in production, since Daphne is meant for Django.
I don't have much experience with modern async python, but I used Twisted coroutines (with `yield` instead of `await`) a lot. I found the greatest benefit was responsiveness. You could write networking code for a GUI in imperative style, and the GUI would still remain responsive. You do have to make sure to not do any big computations, but only IO on the main thread, as async only helps with the latter.
Actually, I found that in some places the code wasn't perfectly "async", or there were some hidden CPU intensive parts, so that every now and then the GUI still had responsiveness issues. I then moved all the network code to a separate thread. Then I had two event loops, one from the toolkit (Gtk) on the main thread, and one from Twisted on the background thread. For communication, I had functions to "submit" a function to run on either thread (a form of message passing). Personally, I found this to be the best way to architecture a complex GUI app. For such an app, the most imporant things are that it is responsive and correct, so this works pretty well.
Every time this comes up (like every 3 months on here) people just kind of digest it as fact without critically reviewing this. There are multiple posts in the wild that dispute Cal's claims and raise various issues with his benchmarks. Like always consider your work load, does it need a level of optimization? Yes? Then compare various performance options.
There are many misconceptions about async python I keep seing again and again. I'd like to offer an alternative point of view:
- async python is faster, but for a very specific niche of workload. Most tasks you do don't fit that workload at all. Which mean most of the time, you should NOT use async Python. And it's ok. Don't make your code complicated when you don't need it. Chose a tech for the need, not the hype.
- async python is not just for performances. It helps with making some specific kind of concurrency easy to reason about, because the context switch is explicit, and the chain of event can appear linearly in the code, thanks to await.
- it's ok to have some sync processes and some async processes. It's not one or the other.
- async is not a replacement for threads or processes. In fact, there are good use cases for having several processes, each with several threads, each with one event loop. Which also means that if you benchmark your WSGI code with 16 workers, you should do so as well with your aWSGI one.
The corollary to this is that some web site loads are very well suited for async (e.g: an SPA with a lot of connectivity which delegates long running code to other services), but a lot are not. If each request makes a long SQL query, seeks the hard drive, dynamically performs i18n and adds some calculation on top, it may very well block the event loop for too long, killing any benefit.
So when would you use async python ?
- For performances, when you need to maintain numerous long connections. E.G: doing websockets ? Use asyncio. Serving static files without nginx ? Use asyncio. Want to create a web crawler and your memory budget does not allow to open 10000 threads ? Use asyncio.
- For the interface, use the async/await keywords everywhere you need need inversion of control for I/O. It's not just about perf here, it's also a mechanism to delegate arbitrary parts of your code with a common standard interface around a fancy state machine + scheduler. You can use that to abstract all your I/O and switch backends at will, while offering callback inlining.
But again, you have awesome threading and multiprocessing pools with python. Not to mention tools like zeromq, scrapy or celery, which do a lot for you. Don't run to async just because "it's faster".
Having async by default in frameworks like fastpi opens up a ton of possibility though. Live settings, pub/sub between processes, websockets...
> It helps with making some specific kind of concurrency easy to reason about, because the context switch is explicit
Assuming you're disagreeing with this, yeah, I've seeing "easier concurrency" be a massive footgun because you have code that's safe as long as an await doesn't get added in the wrong place.
I would say it's pretty hard to add an await in the wrong place, although not impossible. I think it's easier to:
- forget an await
- forget a asyncio.wait or a asyncio.gather
The later is very common, and is being addressed by the structured concurrency movement we've seen with Trio and the likes. This should end up in mainline python eventually because it is, indeed, a problem.
However, it's still way easier to shot yourself in the foot with concurrency and threads than with asyncio.
However, it is harder to startup with asyncio than with threads, because there is more to learn just to boot.
I did a project for a major financial company everyone on HN has heard of. I used async python with just a crazy amount of data coming in mostly all at once (think end of market) and then had to pipe it to different places with a huge emphasis on prioritization.
It worked really, really, really well.
I'm not saying I'd rather use Async over web workers, but when you have the right task for this tool and you architect the solution around it intelligently it really does shine.
I suspect that these benchmarks largely prove that the Linux kernel is better at multi-processing and scheduling than any of the Async Python offerings so far.
> A bigger worry is that async frameworks go a bit wobbly under load.
Async should generally be better for throughput because you can complete work items without interruption, whereas context switching will give all threads a chance to make progress. Async will start to have issues once you do any sort of non-trivial processing in the event loop. The irony is most cases employing async are latency-sensitive, not throughput-sensitive.
Erlang - which does async very well for a lot of tasks - is not particularly fast. What it is good at is juggling a lot of things at the same time in a fairly predictable amount of time.
And a rebuttal from Python core dev @ambv at the time pointing out how the methodology's flawed: https://twitter.com/llanga/status/1271719778324025349