Really, the problem isn't tokio. The problem is this:
> An inconvenient truth about async Rust is that libraries still need to be written against individual runtimes.
That's really the heart of it. If it was really just a runtime, it wouldn't matter what implementation you plugged in.
...but it's not true for the rust runtime; I mean, it's understandable, how can you have one runtime that is multi-threaded and one runtime that is not, and expect to be able to seamlessly interchange them?
I understand it's hard and lot of work went into this, but let's face this. This article is right:
Practically speaking, tokio has become 'the' rust async runtime; but it's an opinionated runtime, that has a life cycle and direction outside of the core rust team.
That wasn't where we intended to end up, and it's not a good place for things to be. I, at least, agree: avoid async. Avoid teaching rust using async. When you need to use it, partition off the async components as best you can. I <3 rust and I use it a lot, but the async story stinks.
We should have an official runtime, officially managed, and guided by the same thoughts that guide the rest of the language.
What we have now is a circus. After 4 years of async being in stable.
I use at least 3 separate runtimes: tokio and 2 no_std runtimes (rtic and embassy). The latter would probably not be possible at all if there was an "official" runtime, because the official runtime would inevitably require allocation, and if it existed they wouldn't bother writing async in a flexible enough way that you could use it without an allocator.
The way async is implemented in rust is actually technically quite impressive, and would almost certainly not exist if there were some official green thread solution.
You could solve async/non-async polymorphism via the introduction of HKTs (and monads) - perhaps eventually they will be forced to do that.
In the mean time, if they can make a few changes like stabilizing TAITs and async traits, that would go a long way to improving the ergos of async.
Not sure if this is an apt comparison, but I like to think that the allocator is a good precedent.
Similar to the async runtime most software needs one and most developers don't care much which one they use and are happy with the default allocator. Another similarity is that both are not just some ordinary old library but required by language features. We also usually don't use multiple ones in a single application.
Still we allow the developer to choose an allocator or bring their own one.
I haven't thought about it super hard, but I suspect the ergos of that would be quite poor, as you would need to pass around the type of the trait object, even though all you really care about is the associated type constructor.
>> An inconvenient truth about async Rust is that libraries still need to be written against individual runtimes.
>That's really the heart of it. If it was really just a runtime, it wouldn't matter what implementation you plugged in.
It is absolutely possible to make a runtime agnostic library that can work over multiple runtimes. With the trust-dns libraries, we’ve managed to provide a resolver which is capable of working on async-std, Tokio (default), and even Fuchsia. It’s harder and takes planning, also to be fair and fully transparent we haven’t achieved this for all features, like DNS-over-quic.
> We should have an official runtime, officially managed, and guided by the same thoughts that guide the rest of the language.
I disagree. Rust is a systems level language capable of being used to build Operating Systems or other embedded tools, having a single runtime would make async Rust something you could not use in that context.
Rust situation reminds me of US military aphorism:
“amateurs talk strategy and professionals talk logistics”
Rust community is endlessly talking and obsessing with strategy where as average Rust user suffer from lack of logistics concerns about libraries / runtime usage etc.
Maybe you could express your concern differently? There are definitely a lot of ins-and-outs about many aspects of Rust. It operates differently from many other languages, sometimes in surprising ways.
I agree that in some areas there could be better guidance. Is Tokio the runtime most people choose? Yes. Would most people be fine choosing that for their daily work? Yes. Might you want to choose a different one? It depends on what you’re doing, others have different goals and tradeoffs. Are there interface choices regarding things like Send + Sync or IO interfaces/traits you pick that will have impacts on how you structure your code to make it portable across runtimes? Absolutely.
And finally, can Rust be better in regards to async development? Yes, everyone agrees that it should be. My big thing is that we really need async traits in the language. We have an excellent work around with the async-trait macro until we get support for it in the language, but you need to discover that, and then recognize some of its idiosyncrasies in certain situations.
Well one big thing so many have mentioned here and elsewhere they simply want to plain sync code and maybe make some http / database calls etc but library ecosystem at large has made it close to impossible to write without async.
But I guess we can go like this:
1) Will community welcome a sync crate ecosystem? Yes.
2) Should people write sync code at all? depends...
3) Can some one write RFC for rust team if they need some feature in Rust? Certainly.
4) Should someone write libraries missing in ecosystem? Yes, community will love it.
I’m guessing that the reasoning behind this is that it would make things simpler if there were synchronous/blocking interfaces into libraries?
I’ve regretted that every time I’ve done it in my career, especially in network programming. All the different error conditions and potential blocking conditions that tcp connections can end up in are just easier to deal with on async interfaces.
I guess a different question I would ask is, what can we do to make async programming easy enough in Rust such that people don’t feel a need to reach for synchronous/blocking interfaces?
> how can you have one runtime that is multi-threaded and one runtime that is not, and expect to be able to seamlessly interchange them?
I feel like I do this in C++ right now without issue? I routinely mix/match coroutines from multiple runtimes, including one I built myself (which theoretically might could be multithreaded but very much right now is not and I know I rely on that still), one from cppcoro (which is a bit broken--I filed a bug with a detailed analysis, but it was never fixed--so I can only use a few parts that happen to add a lot of value), and one from boost asio (which is very much multithreaded and was a somewhat-impressive retrofit onto a more abstract design purely involving callbacks); I also effectively have a fourth, as another library I am using--libwebrtc--maintains its own I/O thread paradigm, and I have chosen to reinterpret a number of its delegation callbacks into coroutines. It involved some trivial adapters in a few places, but I developed those years ago and have long since forgotten as it all works so easily to willy-nilly co_await anything I want from wherever I am... is this really so difficult?
In rust it is yes because rust statically guarantees that you don't have data races (in safe rust at least). So you have marker traits `Send` and `Sync` which indicate that a type can be sent between threads (Send) or shared between threads (Sync) safely. So for a multi-threaded executor which can scheduler tasks on different threads when they resume has to make sure futures are `Send` whereas a single-threaded executor does not have that constraint.
Aren’t the same data races possible in async without threads? As soon as you suspend one task and start another, you have the problem that the currently running task can break the invariants of the suspended one, regardless of whether you’re doing a single-threaded event loop or threads running in parallel.
In a single-threaded system, you only need to worry about concurrency when there’s an await keyword. Everywhere else, it’s as if you have an exclusive lock. Any functions that aren’t async can be treated as atomic. This makes it much easier to reason about concurrency.
With a multithreaded system, async or not, you have to worry about the concurrency issues that come up when sharing data between multiple threads, because that’s what you’re doing.
It’s odd how Rust ended up with the worst of both worlds by default. I think people got overconfident because Rust otherwise handles multithreading so well.
> In a single-threaded system, you only need to worry about concurrency when there’s an await keyword. Everywhere else, it’s as if you have an exclusive lock
Except that async code written this way in JavaScript/Typescript often ends up being subtly broken by evolution in where the awaits occur as the software is maintained. IMO, it’s generally better to design async code with a shared-nothing mentality anyways.
In my experience a missing await is the most common bug. Inadvertently run something in the background for an instant race condition, and hard to find.
I think there should be no default for how to call an async function from another async function. Both waiting for a response and not waiting (starting a "background task") should be acknowledged in the code. Perhaps not allowing a promise return value to be silently dropped would be enough.
Rust isn’t functional, so if you have state that’s shared in some way you can’t expect it to be immutable unless you manage it immutably. However you are assured that you won’t need to worry about thread safety and reentrant code in rust because you are guaranteed the same memory won’t be modified or modified/read at the same time by two threads. Obviously in single threaded asynchronous code this doesn’t happen anyways.
That said, if you don’t use shared state that allows multiple borrows, you won’t see state changing between futures even in the single threaded cases due to the ownership model of rust.
This honestly doesn't sound like a problem as such types fall into one of two categories: ones which need to execute on one thread--in which case resuming them should always resume on their native runtime as they are thread-locked: I already have to deal with this as I adapt between the runtimes in C++ and it simply isn't a concern--and ones whose storage in virtual memory are somehow fundamentally locked to a specific CPU core and I honestly have never myself coded one of these despite having done some extremely low-level development.
Like, here: if I am in my single-threaded runtime and I await something on a different runtime with a billion threads, MY continuation does NOT need to be able to resume on any of those threads as it CAN'T. To achieve "seamless interoperability" I just need to be able to await the other routine and resume when it completes, not somehow make the two runtimes merge into one unified one and violate their constraints. The ONLY data from my coroutines which should end up on a different thread is what I explicitly pass to the routine, not my continuation.
> whose storage in virtual memory are somehow fundamentally locked to a specific CPU core
There are some pretty common reasons why a future in not Send:
1. It is reliant on some thread-local state in which case you can't move it to another thread
2. It uses something which relies of being single threaded for sounds. An example would be `Rc` the standard reference counted pointer in the std. It uses a `usize` for the refcount so it is not safe that have two `Rc` for the same data on different threads. If you need a reference counted pointer that is thread safe you need to use `Arc` which uses an `AtomicUsize` for the ref count and so is Send.
> I just need to be able to await the other routine and resume when it completes, not somehow make the two runtimes merge into one unified one and violate their constraints. The ONLY data from my coroutines which should end up on a different thread is what I explicitly pass to the routine, not my continuation.
Sure, and you could do this in Rust now perfectly fine. Spawn a future on a separate runtime (or a CPU intensive task on a regular thread) and await the result on the current runtime. But by default what happens whenever you hit an `await` is that the coroutine is suspended and goes onto the runtime's run queue until it is woken back up and gets rescheduled. In Tokio's multi-threaded runtime it can be rescheduled on next wake on any worker thread so it must be `Send`. If you use the single threaded tokio runtime there is only one thread so it doesn't need to be `Send`. And even in the multi-threaded tokio runtime you can still spawn tasks that are pinned to the current worker thread using LocalSet.
In writing application code this is (to me at least) mostly a non-issue. Most futures will be Send anyway so the Send bound is not a big deal. But if you do have something that is not Send then you can always use LocalSet to spawn it. The issue I think is really in writing library code where you start to have to add Send bounds everywhere so it jives with multi-threaded runtimes. Like say you have a trait with a method that returns a `Stream` but the concrete type of the `Stream` is not important as long as it produces the required output. So you have
Well now all the compiler knows is that the output implements `Stream<Item = Thing>`. But this may not be Send so you'll get compiler errors if you try to use this in a multi-threaded runtime. So you add Send/Sync bounds:
Great, now it plays nicely with multi-threaded runtimes but even if it's being used in a single-threaded runtime you still require the Send/Sync bounds.
A lot of these complaints like "you need to have Send + Sync + 'static" and "oh no you need an arc or mutex" are identical problems in C++, except in C++ it's totally unsafe if you forego those.
I am under the maybe-totally-wrong impression that people are saying that the runtimes are incompatible, not that you merely need to think harder about the scoping rules and type traits; do I misunderstand what is going on?
I have managed to use async rust for over 4 years and never once use tokio. Primarily this is possible because I just avoid 3rd party libraries with async if they are tied to a particular async runtime. It is limiting, but I think it's important to be lean on 3rd party deps, so it's almost a good thing
I'm very interested in this approach. sorry to be a pest, but could you point to the base traits/interfaces for using asynch without for example tokio? this might help me alot personally to get over some of my issues with rust.
My understanding is you always need a runtime to play the async game -- something needs to drive the async flow. But there are others on the market, just not without the.. market domination... of tokio.
I'm new to Rust so please interpret this as curiosity and not criticism, but why not just use tokio? I understand that it's nice to build applications against a generic interface so that you can swap out libraries if one stops working well, but at this point tokio seems fairly well-vetted, and there are plenty of other parts of a typical stack that require some degree of lock-in: which database you choose, which web framework you build on, which cloud provider you interface with, etc., so I don't see choosing a specific async runtime as a deal-breaker. Could you elaborate on why you do?
Is it possible to avoid async with Rust when you use most common 3rd party libraries? such as ones to make API requests, database connectors, date/time, logging, deal with special kind of files etc.? or are we talking "the burden is on the user to set feature flags and carefully choose which crates they import into their projects"?
Is it possible to set up the Rust toolchain to not allow async in a project at all?
> Is it possible to set up the Rust toolchain to not allow async in a project at all?
It's getting hard.
> Tokio's roots run deep within the ecosystem and it feels like for better or worse we're stuck with it.
Tokio has become a tentacle monster that is suffocating Rust.
Async is fine for webcrap, not good for embedded, and all wrong for multi-threaded game dev.
The trouble is, the web back end industry is bigger than the other applications, and is driving Rust towards async. Partly because that's what the Javascript crowd knows.
Personally, I wish the web crowd would use Go. The libraries are better, the goroutine model, which is sort of like async but can block, is better for that, and garbage collection simplifies things. Rust is for hard problems that need to be engineered, where you need more safety than C++ but that level of control.
>Partly because that's what the Javascript crowd knows.
The "web crowd" leans towards async because most problems at scale where you would reach for Rust are almost always in a situation where they need to concurrently do a million tasks on 8 cpus. It's not because 'thats what the Javascript crowd knows', it's because, since the days of nginx (written in C), its been shown async i/o has better performance.
I don't see a lot of CRUD APIs in Rust - it's almost always database-like systems where the goroutine model and garbage collection cause a headache in terms of either memory usage or latency. I'm not sure if I agree that databases aren't "hard problems that need to be engineered".
That said, the reason Rust focuses so much on the web crowd, is because the majority of people paying the bills are the web companies. The Rust foundations biggest sponsors today are AWS, Google, Huawei, Meta and Microsoft (none of which I would describe as the "Javascript crowd"). AWS isn't hiring Rust engineers to work on game engines.
What I see more of is other industries just don't care about that much Rust.
async/await is really just a syntax for building state machines in a way that resembles regular code. It's compiled down to the same code that you would write by hand anyway (early on it had some bloat in state size but I think it's all fixed now).
That was helpful to understand the problems with Tokio's dominance. As someone using Rust for web ... ehhhh ... stuff :-) I was always / still am happy with Tokio. But now I see the shadows Tokio casts.
And note that it's a good thing that crates are async, because async-in-sync using block_on has only some potential small CPU time overhead, while sync-in-async requires having a thread for each concurrent usage and has potentially catastrophic memory overhead since a user and kernel mode stacks and thread data structures could in some cases be 100-1000x bigger than the future; hence, an async-only create is much better than a sync-only crate (although of course a crate that supports both is ideal from the user's point of view).
Sure. But those async functions can't do any IO. If you need to use IO functions (e.g. from tokio), then you would still need to import that framework.
It is true that if you need to use Tokio, you'll end up using Tokio. That is not what was being suggested, though: it was just that tokio is not required for a simple block_on implementation. If you're already using tokio, using its block_on of course makes sense. But in that case, you're not adding "few hundred dependencies," you're using the ones that you're already using.
And like, to be clear, "the few hundred dependencies from tokio" is also misleading. A `cargo add tokio --features full` adds 43 dependencies to your Cargo.lock at the time of writing.
log crate with env_logger crate has nothing to do with async. pushing to something like elasticsearch instead of letting filebeat scrape your stdout is a different story
> deal with special kind of files etc.?
std::fs came first, the async stuff on top that recreate it in an async fashion came later. i'm pretty sure if you are dealing with a big file you can do std::fs with a "stream reader" basically
Hmm. As a developer, how would one learn if you want no async, "instead of reqwest you use ureq crate"? unless they happen to search on HN first?
Is there a way to tell Rust tool chain that async stuff is to be disabled, and importantly, is there a way to search the crate library with a filter for no async?
> As a developer, how would one learn if you want no async, "instead of reqwest you use ureq crate"?
I just googled "synchronous rust http client" - the top result was a Stack Overflow question where the top (accepted) answer listed ureq as the first option.
(I'd still just use reqwest and Tokio though - practically speaking most of the concerns are non-issues in day-to-day work)
> (I'd still just use reqwest and Tokio though - practically speaking most of the concerns are non-issues in day-to-day work)
I actually did the opposite recently and replaced reqwest with ureq and managed to drop async and tokio altogether and greatly simplify my library. As a newcomer to Rust I kept getting pointed towards reqwest and tokio when ureq is far simpler.
Without knowing what your library does, it's hard to tell if the simplification benefit is worthwhile when traded off against the lack of usability from apps which should not be blocking worker threads. Could well be, but overall I'd just use Tokio and Reqwest (which has a module exposing a blocking API, even!)
I write a shitload of Rust and I think the situation is pretty sane. There are a few warts but the way people talk about it is insane - it's frankly not that bad at all and, mostly, quite good and easy to get started with.
I had already replied to this article over on lobste.rs
The tl;dr is that I think this entire async concern stuff is ridiculously overblown. I suspect the vast majority of Rust devs, like myself, not only accept the current state as "fine" (couple things to work on) but are very happy with the decisions tokio made.
I use rust as a case study about what happens when don't manage a need for users because of indecision and inflexibility. I've generally been disappointed by the rust community because of inflexibility and the unfortunate infighting that spills out. Just harmful to success.
Not managing needs for lots of users because of indecision and/or inflexibility, has always been par for the course in Golang, as they wouldn't and won't introduce greatly requested features without years of careful study and design. And none of that has resulted in an adoption problem (but it indeed did result in a lot of whining). Actually, despite this extremely slow pace to introduce popular features, Go seems to be in very good shape.
One big difference is Go is primarily driven by Google devs and all the heavy duty work once agreed upon is implemented to last details by Google team. Rust is driven by volunteers for most part, so any carefully deliberated and designed things won't amount to much if implementers are busy, uninterested or just want to work other fun stuff and leave some things halfway done.
Aren't the async situation in Rust is because the designer want Rust not to be opinionated and be flexible? i.e. you can choose not to have runtime in your app or using runtime that fit your particular needs.
What need does Rust not serve? You don't have to use async if you don't want to, and for most use cases Tokio suffices. The number of people who hit edge cases with using tokio with other libraries is small.
Is it harmful to success. It certainly seems like Rust has been wildly successful. Maybe the async fragmentation will change that but I don't see any evidence of that so far.
> We should have an official runtime, officially managed, and guided by the same thoughts that guide the rest of the language.
Agreed, it feels like we're in a worst-of-both-worlds situation. On one hand, tokio is relied upon by thousands of crates, and is very opinionated, meaning it's hard to innovate in the async space. On the other hand, tokio isn't a real standard, so we still get ecosystem fragmentation.
You don't. Anything that can be sent between threads needs to be `Send` and anything shared between threads needs to be `Sync`. This is really important invariant that the rust compiler provides.
The problem is tokio only insofar as it blocks attempts to develop an unified, common denominator API between multiple runtimes (for example: how can the Rust ecosystem not yet have a standard async reader trait, after years and years?), and instead encourage all sorts of libraries to depend on tokio directly rather than a facade that works on multiple executors.
Right now cross-runtime libraries are mostly written special-cased: one feature flag for tokio, another for async-std, maybe one for smol if they feel fancy. Almost none for glommio or other runtimes. That introduces a huge burden for libraries, that would rather depend on a single API.
Rust shouldn't have an official runtime; it should have APIs that make possible to write libraries that don't dictate which runtime you must pick.
The main issue is that there is no consensus yet on how the API should look like. Considering Rust's backward and forward compatibility promises, committing to an API is an extremely serious step, all the more difficult when it's not clear what the API should be.
Instead, Rust is waiting for patterns to emerge in the ecosystem, so that the universal API is compatible with all of them. It's a much safer route, but it also means waiting for the ecosystem to fragment. This is where we are now, but it's necessary and will get better over time.
Edit: async readers/writers are a good example, because it seemed trivial to set in stone, except now we have io_uring that might require the API to move to the kernel instead of just holding &mut.
> We should have an official runtime, officially managed, and guided by the same thoughts that guide the rest of the language.
If it goes into official runtime, then backwards compatibility will kill it eventually. You'll have situation where in year 2078 someone will ask why are we still having tokio when everyone is using telepathy lib?
> What we have now is a circus. After 4 years of async being in stable.
It's caused by strong backwards compatibility guarantees and long RFC process + unexpected problems.
Without strong backwards compatibility, no one would be using Rust.
RFC exists to hash out unexpected problems but so far we can't peer in the future.
Here is an example:
Want to make Range from non-Copy to Copy. I.e. make a new type, rename old to new. That will be one year for RFC and two edition to stabilize circa Rust 2028.
By that measure async fixes have been blazingly fast.
What if Arc and Rc weren't in the standard library, and you had to import them via a crate, and multiple different (incompatible) implementations existed such that you couldn't use them at the same time?
Would that be ok?
How about Option and Result?
What if you could only use crates that used the same error library that you wanted to use?
What about boxing and custom allocators? Can you imagine if different crates could opt into different allocators and you couldn't safely drop an object without passing it back to the crate it came from because 'who knows' what might happen if you try to deallocate it using your allocator?
Should we not ship a default allocator and make that an optional thing too?
...
That isn't a language I want to use.
I'll take 'it comes with a default allocator' and that's good enough for me. If one day I get a stable 'you can pick, seamlessly at the top level, which allocator to use for your entire program', that's awesome!
...but it does not in any way mean, that I want a rust with no default allocator.
The default allocator is great. It works perfectly for most things most people need, and it 'just works', out of the box, the first time you use rust.
Async should work out of the box. It doesn't. That sucks.
This is somewhat of the case with many different Result/Error crates doing their own thing. So being in standard lib isn't a guarantee it won't fracture.
> What if Arc and Rc weren't in the standard library, and you had to import them via a crate, and multiple different (incompatible) implementations existed such that you couldn't use them at the same time?
First that's not currently what is happening in Rust. Second, I'd probably use the most popular and active one. Same as in JavaScript or Python.
I think my criteria for what is in the standard lib is following: How often does the domain change? And should it come out of the box?
E.g. are we inventing new ways to parse JSON? Yes ? Out of the standard lib you go. Is ARC/RC being reinvented? No? Go to standard lib.
>> If it goes into official runtime, then backwards compatibility will kill it eventually. You'll have situation where in year 2078 someone will ask why are we still having tokio when everyone is using telepathy lib?
This kind of situation happens and leads to a second, newer official runtime getting adopted and the older, legacy runtime being supported as long as is needed.
This happened with Java's official GUT toolkits which started off with the Abstract Windowing Toolkit (AWT) and then moved to Swing and almost moved to JavaFX.
Having multiple officially supported core components is not necessarily bad--it can be a sign of good backward compatibility balanced against the need to improve core components.
I think it is better than the alternative: multiple unofficial, de facto standard components that are incompatible. Who knows which direction each unofficial components will go and newcomers do not know which one to choose.
Yes. Oracle's goal for Java is to monetize it by reducing maintenance by divesting components to the community and focusing on the components that big organizations use (so they will pay for them).
JavaFX was one of the many components that was spun out to the community and now lives as OpenJFX (https://openjfx.io/).
JavaFX would have replaced Swing had it not been for Oracle's change of direction.
>> "OpenJFX is a project under the charter of the OpenJDK." Many committers are Oracle employees.
Yes, but while Oracle employees may contribute to OpenJFX development, JavaFX is not an officially supported Oracle product (beyond legacy support of old JDK versions):
"Do I need a separate support contract for JavaFX?
No. JavaFX is part of the technologies covered under Oracle Java SE Subscription. As of JDK 7u6 JavaFX is included with the standard JDK and JRE bundles.
Note that for JDK 11 and later JavaFX is no longer included the JDK but remains available as a third party library from other vendors."
> You'll have situation where in year 2078 someone will ask why are we still having tokio when everyone is using telepathy lib?
Agree. This is why I refuse to use solar panels on roof. Science clearly tells sun is gonna burn down all its fuel and implode and at that point my solar roof will be useless deadweight.
I don't understand why modern languages use "async" to do cooperative multitasking. Maybe someone can enlighten me.
My (probably incorrect) understanding is that "async" arose from Javascript. It arose because pure event driven code is error prone and hard to get right compared to linear (stack based) code. The usual solution is threads, be they real or lightweight (aka cooperative multitasking) - but Javascript doesn't have threads and never will. Compared to pure event driven code using zillions of objects to save state, the pseudo stack based async solution is indeed a blessing.
async is in effect a poor mans emulation of lightweight threads. It comes at the cost of needing language syntax to support it ("async" and "await") and it creates different colours of code, ie code that can't be mixed. The end result is parallel implementations of lots of libraries, leading to the situation the article and above comment both moan about.
Lightweight threads / green threads achieve the same outcome as async, but without the downsides. No language extensions, no coloured code, all existing API's remain backward compatible. Javascript didn't have a choice, but why any language that does have a choice would use the async solution has me completely baffled. It's not like we didn't have numerous examples such as Elixir or Go, yet Rust went with async anyway.
Thanks. A fascinating history. Do keep making them.
One thing that had me scratching my head was the "green threads made C calls slower" comment. I don't understand why C would care where you call it from.
> This is not the case. Both have pros and cons.
Your talk highlighted on green thread con I hadn't though of. It hadn't occurred to me green threads introduced coloured code, just like async does. That was made plain by it needing a std::io implementation.
But that isn't an additional con green threads have over async - it's a con they both share. While I get that green threads didn't interact with native threads very well, but I'm making a bet that was because they tried to hide the colours (different IO library) it needed, so the programmer didn't have to care. Async would have had the same problem had it tried to hid the colouring it introduces, but they solved that by not hiding it.
Async warts over green threads of introducing a new syntax and a slightly different programming style remain.
> I don't understand why C would care where you call it from.
The details here differ based on what kind of green threads you are implementing, but the core of it is, they're cheaper than regular threads because they do not use a normal stack. C expects a normal stack. Bridging this gap has a cost. You also have to manage the interaction between the GC and C, which can have a cost. If you're curious about specifics, one example of this is cgo: https://go.dev/src/runtime/cgocall.go Go has changed strategies here several times throughout its history (as did Rust when Rust had green threads), so you may find other information that's older as well.
I should reveal at this point I've created protected mode x86 OS's from scratch, written BIOS's and what not, all done in C, so I do know a bit about C and stacks.
As I expected there is nothing in cgo that suggests C that cares about a stack. That's not surprising as with the exception of esoteric things like setjmp, and backtraces, C doesn't care. You can happily malloc a block of memory and point the SP there, push the args and call a C function, and it will do it's thing and return. It's vaguely possible the OS may get pissed off that the stack isn't where it thought it should be - but the user space C function won't notice.
What cgocall() (the function that handles go call's to C) spends most of it's time doing is tell the green thread scheduler what is happening. I'm guessing the reason for that is the C code is effectively code of a different colour - ie it's code that could be using blocking I/O calls. If the C function does block it won't stop just the green thread calling it, it will block all of them. I imagine is not considered acceptable in Go. A work around would be to move the green thread to a different native thread while the C function is running. Maybe that's what all that bookkeeping accomplishes does. As you say, and as I can see in cgocall(), the overhead of bookkeeping involved is literally orders of magnitude bigger than the overhead of the C call itself.
And as you also say, that overhead isn't acceptable for Rust. The solution Rust has implemented for async is effectively ignore the problem, so if a async function calls a C method and that C method blocks, then every async task stops until that C function returns. It would have been a perfectly acceptable solution for green threads too. But I'm guessing the original Rust green thread went for the Go "make the library hide the problem from the programmer" approach, and found itself stuck with a whole pile of overheads that ended up being unacceptable for a systems programming language.
If so, the solution wasn't to throw out green threads and adopt the async solution. That was akin to throwing the baby out with the bath water. The simple solution was to just take the async approach and make the issue of blocking C calls the programmers problem, as opposed to hiding it with the runtime libraries.
If they had have gone that route even handing blocking C calls could have been made relatively straight forward - just provide a library function calls the function it's passed in it's own thread. (Maybe async already provides a similar function now?) Effectively that lets the programmer choose when to take the C call overhead Go imposes on every call, and when to avoid it.
Right now, it looks to me like my opening comment still stands - green threads (although not Rust's initial implementation) would have been a much better solution over async to the multi tasking problem. At the 1000ft view, green threads and async are very similar. Both get their speed by using event driven I/O rather than blocking I/O, and thus avoid the overheads of OS task switching. The key difference is where green threads store state on a separate stack (a technique so wonderfully efficient we use it everywhere), async stores it in manually allocated block that must then have data copied into it, and later freed. That manually allocated block creates a lot of overheads, both in code and at runtime, that green threads don't have.
> As I expected there is nothing in cgo that suggests C that cares about a stack.
Okay well again, I'm trying to be very broad and vague here, because the details do actually matter but differ between systems. C in a general sense doesn't care, as you elaborate, sure, but because these stacks are so small, and C code doesn't know how to expand the stack (since there's no API to do so), you run the risk of overflowing the stack. So in practice, that stack usage does matter, and the way that you protect against this is to set up a regular sized stack, swap to it, and make the call. At least, in this specific implementation. http://manticore.cs.uchicago.edu/papers/pldi20-stacks-n-cont... talks about tradeoffs of six different ways of implementing this kind of thing, for example. (both Go and Rust tried the "segmented" strategy here and threw it out, for example.)
> (Maybe async already provides a similar function now?)
Many implementations provide a threadpool for you to throw blocking stuff onto, yes. That's up to the given runtime. But again, that's purely for the blocking semantics, it isn't about calling into C vs calling into Rust.
Anyway if you truly want to understand this space I would encourage you to continue looking into it, but when it comes to demonstrated performance in the real world, the green thread strategy loses out. There are other great reasons to choose that model, but for Rust's systems language goals, as well as its performance goals, async/await is the only design that's made sense.
Ahh, all those speculative words from me, and it turns out there is a Rust green thread implementation out there now. May: https://crates.io/crates/may
And it's included in a set of independent benchmarks of http servers written in variety of languages: https://www.techempower.com/benchmarks/#section=data-r21&tes... May (and Rust) put in a very good showing there, may-minihttp taking out 2nd spot. Another Rust library, xitca-web, takes out 3rd spot. Neither may-minihttp nor xitca-web use async, but there are other Rust async implementations that come close to them. I'd call it a wash.
From that I'd say may's green thread implementation is on a par with async speed wise.
May is an unsound library; you can access TLS and it will cause UB, in purely safe code. I’m not familiar with the other one though, I’ll have to check it out, thanks!
That would be an issue for green threads. And other things, as I discovered when I took a brief look at the may code to see if they handled stack allocation. Turns out may doesn't don't handle it directly - the standard library (nightly) has a way of creating stacks for co-routines (generator::Gn). May's green threads are just co-routines, and the Rust nighly library provides the stack.
That means if it is the issue I linked to, it's a bit unfair to blame it on may. The same bug will manifest itself any Rust nightly generator that calls TLS.
Probing further, it generator::Gn creates using stack::Stack, and stack::Stack allocates stacks using malloc. And yes, that guarantees stack overflow will cause UB of the worst sort because it just overwrite the next malloced block. Someone should lookup "man 5 mmap" on Linux and BSD. Both have ways that create stacks behave very nicely, including causing a hard fail if they overflow rather than UB. I presume Windows has a similar function.
To repeat the point I keep making: all these issues with green threads aren't intrinsic issues to the concept. They arise because the initial Rust implementation wasn't well designed, and not implemented particularly well either.
Looks like they made the same design decision as Rust's early green thread implementation. Quoting that link:
> The key benefit of green threads is that it makes function colors disappear and simplifies the progr'samming model.
As a point of order, no, green threads don't make colours disappear. They can't as the whole point is to run multiple tasks, so no green task can be allowed to make a blocking I/O call like native code does, so you have re-do every I/O library using non-blocking I/O. And thus green threads must use the non-blocking version of the library, aka as a different coloured code.
Where green threads are different to async is the language library can make the colouring disappear for green threads. It does that by, on every I/O call, checking if a green thread is making the call and switch between blocking and non-blocking I/O accordingly. That incurs a speed penalty of course. And it doesn't just hit green thread code, it slows down native threads too.
Looks like .net decided that overhead is too high to bear. Fair enough - but that's a consequence the decision to hide coloured code, not green threads per se.
While you could do the same trick to hide blocking vs non-blocking for async code too of course, it wouldn't hide colouring. That's because async colours code in other ways too - for example it introduces a whole now call / return syntax. Unlike "not needing colours", not needing a new syntax is a real advantage of green threads over async. Another one is saving state on the stack rather than a malloced block. (If writing function locals to a malloc'ed block was faster than pushing them on a stack was faster we would do it everywhere.)
Odd they didn't compare the most common strategy used in practice, which is the one the linux kernel uses. The technique is described in mmap(2), under the MAP_GROWSDOWN flag. Even if you allow for a 64Kb stack for each green thread a 32bit machine has enough virtual address space for thousands of stacks. If you need more add an option to trim down the stack size.
> But again, that's purely for the blocking semantics, it isn't about calling into C vs calling into Rust.
Yes, it's blocking semantics. But the reason given for abandoning green threads was those calls from Rust to C were too slow in green threads, and the only reason I can see that would be is the library is attempting to hide those blocking semantics by intercepting every C call. It it didn't there would be no speed disadvantage.
Yes, intercepting slows down the call by an order of magnitude. But there is another solution - don't intercept the calls, let the programmer handle it instead. That's the solution async adopts. If you are going to claim green threads are slower than async then it's only fair to compare apples with apples, and that means comparing implementations that do it the same way.
Mind you, it's purely a guess on my part that the old green threads implementation slowed C calls by intercepting them, so it's purely a guess we aren't comparing apples with apples. The guess is based on the fact there is no other reason green threads C calls should be slower, as C doesn't care one way or the other.
> There are other great reasons to choose that model, but for Rust's systems language goals, as well as its performance goals
I can't see what systems language goals would be broken by green threads - but then I'm not familiar with them. Apart from the C call thing, green threads should be faster as they are storing data on the stack rather than copying it into a manually allocated block. Since the C call thing is looks to be a problem with the design choices of that early Rust green thread model, I don't trust the claim an implementation of green threads that makes the same tradeoffs as async currently does would be slower. And green threads does provide a much cleaner API.
But I guess the response to my whinging at this point is "patches are welcome", or rather an appropriate green thread implementation.
Not being far enough into rust to know: is there a reason they can't settle on shared APIs?
For single-threaded vs multi-threaded obviously you'll need to split the APIs, but why can't all 1/N-threaded runtimes work with all 1/N-threaded coroutines (and perhaps another split for no_std)? Ignoring historical differences of course - backwards compatibility means early ones probably can't ever work together. But the ecosystem isn't forever bound to those early implementations.
The first piece is the compiler support for the keywords, to do the state machine transformation for you. This can only be done by the compiler, so it's agnostic of the runtime. This support also requires some ancillary standard library support for what an async task is, to know what the result of the state machine transformation looks like. In pseudocode, this is:
do_async_task(wake) -> Poll<result>:
if !ready_to_read:
schedule a call to wake when ready to read
return Poll::not_ready
return compute(read())
The second piece of the puzzle is the top-level code to drive the polling. This is what's commonly thought of as the runtime; you generally have a top-level event loop, and asking to wake means injecting a new event in the event loop that will call the async task again.
The third piece of the puzzle is the "schedule a call" part, essentially this is the code that works with low-level system calls like epoll or io_uring or IOCP or boring old select. Except, as you immediately see if you have experience with such system calls, writing that code requires that the top-level event loop essentially be consistently triggering the call to check for new work. So this piece of puzzle, especially for I/O, generally needs to be intimately connected to the top-level executor to work well.
What could the standard library do (or have done) to make things work better? The obvious thing is standardize basic async concepts like AsyncRead or async variant of iterators, except the difficulty with that for I/O in particular is that the library would be standardizing interfaces without providing implementations. Less obvious is baking in a standard I/O event loop interface--a standard way that would allow adding new events to the runtime's main epoll or whatever interface. However, it sort of turns out that many OSes don't actually provide nice interfaces for "wait for I/O or timer or child process status change or GUI event or ..." which is what you really want to have.
Couldn't that "schedule a call to wake when ready to read" be "it's just another future that does whatever it needs"? Whether that's "have the shared epoll-poller check every millisecond via a timer and resolve the relevant futures that it is observing" or "wait on a blocked thread" doesn't seem like it matters. It needs checking either constantly (hot loop), after a period, or it'll be externally resolved and that resolver will let the event loop know something's ready, and... those seem rather straightforward to label and support.
I can definitely see why an enforced-shared tightly-integrated epoll-er has implicit performance benefits, but that hardly seems necessary either. And stuff chasing the last bits of performance basically always give up some interoperability.
If you're deferring to another future, that's the first piece of the puzzle (which is primarily "solved" by having the compiler do some magic to make the deferring easy, although things like the futures crate also provides a lot of useful interfaces for async I/O without actual implementations). At the end of the day, there is some fundamental future that has to implement the "schedule a call" piece, and that requires some degree of coordination with the top-level executor.
Coordination as an API, sure. We have ways to make that extensible though, why do they not work here? "External wake or check later" seems entirely feasible and not at all "intimately connected".
I'm not sure there is any reason in principle but it doesn't work now because the APIs for various things are not really standardized. So for instance timers. If anywhere in my code I do `tokio::time::sleep(Duration::from_millis(100)).await` (which is quite common thing to do) then my code will no longer work with a non-tokio runtime.
It does. Under the hood it may just be creating a kernel timer but something needs to actually wake the task back up when the timer elapses, which is what the runtime does.
The solution would probably just to create standard interfaces in std for this so you could just do `std::async::sleep(Duration::from_millis(100)).await` and just delegate the implementation details to the runtime (or something like that).
"You can wake up the sleeping loop" clearly depends on the runtime in that you have to contact the runtime to wake it up, but beyond that I don't really see it.
Like, if I start a thread that calls `sleep(100); timer.resolve(); runtimeInstance.wake()` how is that related to the implementation details?
Sure, you could implement your own sleep functionality that way and it would be independent of the async runtime but the async runtimes already provide that out of the box in a way that doesn't require spawning a new thread so that is typically what gets used.
Yeah, I get the ergonomic benefits. Same imports you already have, fewer arguments, etc - there are a lot of reasons why people would prefer the specific versions.
It's more that a shared API keeps getting presented as an impossibility without stdlib support, and I don't see why that would be true. Stdlib isn't special like that, nor should it be, and nothing seems to be asking for compiler magic (necessary for await in the first place, but not really beyond that).
If anything, the failure of the ecosystem to settle on a shared API seems to imply there should not be a stdlib version - let the competition continue, don't choose any until it's clearly the best choice forever.
> how can you have one runtime that is multi-threaded and one runtime that is not, and expect to be able to seamlessly interchange them?
That works fine for C#. When writing UI applications all async code started from the UI thread will continue to run single threaded on the UI thread. Everything else runs multi threaded in a thread pool. No need to change any of the async code to work for either.
IIRC C# has management under the hood with different SynchronizationContext [0] to manage this, and it can lead to bad habits like sprinkling "ConfigureAwait(false)" all over code.
It's also devilishly hard to understand, I've read a blog [1] on the subject several times and don't always fully grasp the consequences of different options.
No it doesn't, hence why there are best practices guidelines written by the .NET architects, and there was a research project to add Go/Java co-routines as well.
Yes it does. Those best practices are very easy to follow and are enforced by analyzers. I have never encountered those issues on a recent big project I worked on, although they were common in the past when async was new. Also the green threads research concluded that it's not worth adding it to NET:
Ah the usual argument that good programmers never make mistakes.
The green threads research (which are those github issues I already linked to) concluded that it's not worth adding it to NET, because basically now it is too late to retrofit them into .NET, without having the issue of yet another way to color code, for little gain overall.
I also suggest watching the BUILD 2023 ASP.NET panel on the matter.
> Big if, not everyone is using .NET latest on VS, with latest version of every library using async/await.
The analyzers are part of the build, they work everywhere (command line, VS code). Either way, I am glad we finally agree that async await "works fine" for modern .NET.
> so I don't know, maybe I actually already read them?
I did, I even responded to your links with the official conclusion from the NET team on those issues. Now maybe your turn to read?
The Java implementation required modifying the IO routines, and this is greatly helped by Java interop being far from easy. .NET was always more friendly to interop, lots of projects would be affected.
Fixing this would either be a serious break in the ecosystem, or you'd have a new capacity with way too many asterisks to be useful.
The .NET world already spent the budget for big migrations with .NET Framework -> Core (just like Java did with Java 8->9, or Python with 2->3) - as much as we might like green threads, they aren't useful enough (compared to async/await) to justify another break.
You keep referencing these articles on async I think it is best that you stop. Some of the advice has been known to cause controversy, nor is necessary to think about in standard line of buisness code.
Compared to .NET, Python and JavaScript async/await implementations, they both suck in the amount of boilerplate needed to implement, and debug async runtimes.
As there is nothing being shipped in the box, both suffer from "go hunting" for runtimes, and the interoperability between them.
On Windows, there are bonus complexity points, as they also get to interoperate with COM appartments, and OS async APIs.
> What we have now is a circus. After 4 years of async being in stable.
Hmmm, I wouldn't put it that way. There's Tokio as the defacto default runtime with a large ecosystem for "standard" usecases. Axum (Hyper, Tower middlewares) or Actix, SQLx/Diesel/Rust-Postgres, Request ... are a wonderful and for my limited usecases rich ecosystem.
> What we have now is a circus. After 4 years of async being in stable.
I stopped really paying attention to Rust about 5 years ago, and am asking purely out of ignorance/curiosity, but has the community/leadership approach changed much since then? I remember async being the Shiny New Future that was talked about a lot back then, but it certainly seems like what's been added has not really done well?
A very informative article, that brings up important pain points and problems.
I didn't agree with this sentiment though:
> In a recent benchmark, async Rust was 2x faster than threads, but the absolute difference was only 10ms per request. To put this into perspective, this about as long as PHP takes to start. In other words, the difference is negligible for most applications.
This statement is an ugly thorn that sticks out of the otherwise well written and reasoned article. It hurts me deep on the inside when I read stuff like this.
I agree, saying that 10ms PER REQUEST is negligible is insane. If he actually read the benchmark that was referenced, he probably didn't mean what he wrote there: the benchmark measured a 10ms difference in processing an unspecified fixed number of requests from ~100 connected clients (the benchmark article isn't actually very good, and I don't care enough to dive into the github and find out what was measured).
Author here. The benchmark part could be clearer; I acknowledge that.
Interestingly, when working with a limited number of threads, the thread approach is actually faster in that benchmark. So in practical applications, the differences are marginal and likely lean towards threads.
But even if this weren't the case, context matters. A 10ms discrepancy in a web request might be acceptable. However, in a high-performance networking application - which, let's be honest, isn't a common project for most - it could be significant.
If you would measure pure latency between a single request (HTTP, RPC, whatever), the latency difference between any async or non async implementation should be microseconds at most and never milliseconds. If its more, then something with the implementation is off. And as you mentioned threads might even be faster, because there is no need to switch between threads (like in a multithreaded async runtime) or are there needs for additional syscalls (just read, not epoll_wait plus read).
async runtimes can just perform better at scale or reduce the amount of resources at scale. Where "at scale" means a concurrency level of >= 10k in the last benchmarks I did on this.
Concurrent is rarely faster than parallel, across almost any language that supports it. If you know that you don't need obscene scalability (1000 connections is pushing the edge of what's reasonable with parallelism) then stick with parallelism. If you overuse parallelism then expect your entire system (OS and all) to grind to a halt through context switching.
Lol, fair enough, but the C10k problem these days does have a "just use OS threads" solution. It wasn't free; it took a lot of work across the industry. Computers have gotten both faster both single core and wider number of cores. And kernels have spent the last couple decades really working hard on their schedulers and kernel/user sync primitives to handle those high thread counts.
The native model falls apart under C10M, but to be fair so does traditional epoll/queue/iocp dispatching coroutines model of solving C10K. That's where you start having to keep the network stack and application data plane colocated in the same context as much as possible. That can be done with something like DPDK to keep those both in user space, or Netflix is known for their FreeBSD work making KTLS and sendfile kiss in order to keep the data plane completely in the kernel.
Just totally depends. I’ve worked on systems that had to be wire-to-wire in ~5 mikes at the p99.9, and that’s crazy slow compared to the HFT assassins who are rumored to be under 100ns these days.
If you’re in single-digit mikes at the tail you’re not fucking around with someone’s green threads, and at 100ns you’re in an FPGA or even ASIC.
To serve a web page? 10ms, eh, I’d rather not spill on purpose but it’s a very, very rare professional CS:GO player who can tell. If my code is simpler and cheaper to maintain and more fun? Maybe I pay it.
What I don’t want is to ‘static bound shit to burn millis. Burning millis should buy me a beautiful sunset and a drink with an umbrella in it.
10ms is one hundred requests per second. Get ten users using your site at the same time and you are in trouble, because a single link does more than load just one request.
That's assuming no concurrency, which isn't applicable. Every public CDN will have around 10ms latency per request (because it takes time to load data from disk, fetch from upstream servers, apply WAF rules, etc). But they still handle 5 digits of requests per second.
Not sure why you're downvoted. This is indeed nonsense. I love async Rust but I don't understand we have to revisit this topic as if there's some dire hangup or something every few months.
Well, one clock cycle is measured in picoseconds. That means, even if we assume 1 cycle is one nanosecond, that's 10 million CPU cycles you can do something with.
Your scale is off. At the moment it is still better to think of clock cycles as nanoseconds; it's still large tenths of a nanosecond. 1 cycle per nanosecond is 1GHz, so a modern processor has 2-4 cycles per nanosecond, in which it can do a lot, but not anywhere near 10 million cycles.
The Rust and tokio folks are working on difficult and complex problems, I appreciate and thank them for the work they're doing to improve server and desktop app performance everywhere. We all have multicore machines and it would be great if we could use more than 1/8 or 1/12 (or whatever high thread count of your beefy servers) of our hardware.
The Rust multithreaded thread memory management (and Arc and so on)* causes me to be uncomfortable because of a key lesson I've learned is that you cannot scale a program by adding threads and expect it to accelerate mutate access to the SAME memory location. Single threaded memory mutation performance is a fixed known quantity. Adding threads with contention for same memory location causes throughput and latency to be slower to a particular memory location at single threaded speeds because you need mutexes or a lock free algorithm to communicate safely.
To accelerate data fanout or storage (writing to memory from multiple threads) you need a shared nothing architecture or sharding.
This means that when you reach for threads and I'm guessing you're wanting to reach for threads for acceleration and performance you need to design your data structures to not share memory locations. You need to shard your data.
> We all have multicore machines and it would be great if we could use more than 1/8 or 1/12 (or whatever high thread count of your beefy servers) of our hardware.
It is probably important here to realize that async solves concurrency, not parallelism. You can use async with a single threaded runtime for I/O concurrency and mix that with threads for computational parallelism for long running jobs.
That said, there may be some benefit a multi-threaded runtime would have for the typical I/O bound app (after working around lifetime limitations by adding Send/Sync to data structures). This is because I/O bound programs and those requiring computation are not mutually exclusive and there is always some amount of computation going on, so there may still be some benefit. I doubt a synthetic benchmark would answer this as those typically don't measure any actual work performed, but just "requests/sec".
> It is probably important here to realize that async solves concurrency, not parallelism. You can use async with a single threaded runtime for I/O concurrency and mix that with threads for computational parallelism for long running jobs.
In my experience, it's impossible to mix threads and async tasks. They can't communicate or share state. Threads need locks, while async tasks require an awake mechanism. If you just stick to unbounded channels that don't block on send, you can get far, but in 99% cases you will need to decide upfront on a specific approach.
This has not been my experience at all. Delegating compute-intensive tasks to rayon inside a tokio runtime is not particularly hard (assuming you can pipeline things to separate IO and compute effectively).
A pattern that has worked quite well for me is to use
Shared mutated memory isn't necessarily a problem, because it still is unknown how often access is required to that memory.
E.g. having a thread that spends perhaps 1% of the time with state mutation vs 10 threads spending each 3% of the time with state mutation.
You have smaller efficiency per thread, but still higher efficiency overal
If you're just using Arc<T> without any other parallelism primitives, then it's immutable and the cores can all read without blocking. The only thing it does is reference counting to know when to Drop.
Blindly using Arc<Mutex<T>> without considering access patterns is a software architecture problem, not a problem with Arc or Mutex.
Send + Sync is maybe not enough but wouldn't it be possible to say: as long as the memory location is read-only you can parallelize access to it. Send + Sync helps pass the read-only data through without synchronization to all threads, while the rest of the Send and exclusive mutability system flags the tricky points for you.
I can see that Send/Sync by itself does not tell you if the data is read only or just internally synchronizing mutation.
I still don't understand why async is faster. Sharding data can be as simple as a buffer per thread in a thread pool to catch incoming data. With async, don't you havev to allocate that input buffer each time? That seems hideously expensive.
Which performance metric are you looking at for "faster"? Async is cooperative multitasking applied at a different level of abstraction. Much like OS level multitasking it adds overhead, and reduces performance in terms of latency. On the other hand it improves throughput by allowing better resource use.
>don't you havev to allocate that input buffer each time?
Have to? No, you could pre-allocate and reuse buffers. It is less straight forward than the buffer per thread strategy, but possible.
I think the idea is that while your blocking waiting for IO in one task you can serve a different task, potentially from a different user. Coroutines, green threads, communicating sequential processes as in Go or Occam.
Pure Python is a very slow language compared to Rust, with significant differences in orders of magnitudes of expenses. I would not expect information about Python performance to be particularly relevant to Rust without further evidence directly from Rust.
I think in retrospect, it makes sense to me that if you are io bound vs cpu bound (like my stuff usually is) that async could let you wait on more things at a time.
I think the whole "IO bound" thing has taken on a life of its own and attained a legendary status that is not always an accurate reflection of reality. People often seem to model things as if "waiting on the DB" is all their system does and the code they wrote executes in exactly 0 nanoseconds, but that's not how it works. It isn't actually that hard to talk to a relatively local database with some well-optimized query and be doing CPU work either comparable to the wait you spent on the DB, or even greatly exceeding it, at which point your language's performance in fact does matter, potentially even dominates.
> By doing so, one would set up a multi-threaded runtime which mandates that types are Send and 'static and makes it necessary to use synchronization primitives such as Arc and Mutex for all but the most trivial applications.
That is a very weird argument to make. Tokio has very convenient APIs (LocalSet + spawn_local) for spawning non Send futures that you temporarily await (which really is the only useful thing non Send futures can do).
If anything tokio significantly improved the user experience of async in Rust in general because it promoted Send futures.
Where it has tripped me up in the past it has been because of the way the compiler rewrites async code and you implicitly capture scope at await points. So you code doesn't compile because "future is not Send" and it's not immediately obvious why. And then it turns out that 100 lines previously you had a parking_lot::Mutex local variable which got captured in scope. So you need to refactor a bit to make sure that local variable is out of scope at your await point.
There's always a trade-off. By promoting Send futures, Tokio prioritizes safety and parallelism. However, this does add complexity for developers, especially newcomers. They need to be aware of the Send and 'static requirements and might have to use synchronization primitives more often.
Because of this, I think promoting Send futures as the default is the wrong way to go.
LocalSet + spawn_local are great, and I wish more developers would know about them, but the Tokio tutorial [1] doesn't mention that and focuses on the multi-threaded runtime instead. AFAIK LocalSet is only mentioned in the docs [2]
> LocalSet + spawn_local are great, and I wish more developers would know about them, but the Tokio tutorial doesn't mention that and focuses on the multi-threaded runtime instead.
As someone who programs async Rust since early days (where tokio did not enforce Send bounds), people build themselves into horrible patterns (myself included). Once you go deep on non sendable futures, you can quickly end up creating something you shouldn't have. So I think it's more than sensible to tell people to do the right thing and then follow up on the exceptional case via API docs or a followup guide.
Indeed, while the Send bound can safeguard against potential concurrency issues, it also dictates a specific architectural direction for applications. Consider a web server: with the Send bound, you might be encouraged to design it such that each incoming request is handled by potentially any thread in a thread pool. Without that bound, you might lean towards a more lightweight, single-threaded model similar to Node.js, which doesn't require Send bounds and still excels at handling I/O-bound tasks.
"Doing the right thing" can vary based on the context. For instance, in embedded systems where threads aren't available, requiring futures to be Send is unnecessary. Thankfully, the standard library does not enforce this and neither does Tokio with spawn_local, but embassy exists because there's a genuine need for async frameworks tailored to the unique constraints and requirements of embedded systems.
> "An inconvenient truth about async Rust is that libraries still need to be written against individual runtimes."
In general Rust has tried hard to improve on the developer experience of C++ by providing more safety in the language and better defaults in the standard library. So it's interesting that both languages have now ended up in a similar same place for async.
(C++20 coroutines finally enable sensible async libraries, but code written against a higher-level library isn't easily portable to another one even though both are using the low-level language coro primitives.)
> "Freely after Stroustup: Inside Rust, there is a smaller, simpler language that is waiting to get out. It is this language that most Rust code should be written in."
Maybe there's a Meta-Stroustrup's Law in effect:
"Every successful language eventually becomes one which contains a smaller, simpler language struggling to get out."
It happened to C++ and Java and JavaScript, now Rust seems to be reaching that point.
In practice it's not hard to make your app support async and sync, simultaneously. Quick XML does it via macros, which looks very similar to keyword generics.
Edit:
> Maybe there's a Meta-Stroustrup's Law in effect:
> "Every successful language eventually becomes one which contains a smaller, simpler language struggling to get out."
Corollary: every simple subset of language contains missing features dearly needed by someone else.
> Corollary: every simple subset of language contains missing features dearly needed by someone else.
This could even be said about larger languages, though. If Rust committed to pleasing everyone, it would have an optional garbage collector, lifetime annotations would be optional, and there would be an interpreted runtime available as an alternative to the compiler. Those are features that some people do dearly need for some tasks. There's a point where you have to stop and put limits on what the language is actually for and what it's not. It seems to me that much of the push to add async/await in the first place was from people who really should have been using Go, Java, or Node, but wanted Rust to be their "everything language." It's okay to say, "This language is for writing performant systems applications. It's not a fullstack language for writing a webapp."
Regarding the one runtime point, I want to counter that it is also advantageous to not hardcode one runtime in std. This allows one to use different runtimes on webassembly. This has bitten the official async go implementation for example: https://news.ycombinator.com/item?id=37501552
From my occasional skimming of WebAssembly meeting minutes, I'd say that Wasm will likely grow the features required for Go to perform well. There's plenty of interest in stack switching, coroutines, etc.
I work a lot more in Rust than I do in go, but I think each language made the trade-off that made most sense for that language.
> I'd say that Wasm will likely grow the features required for Go
Wasm has been really great at shipping the MVP, but they are pretty slow about shipping the many features that build on it. In general, this makes sense as the system can't be changed once its stable. But it also means that a lot of things are still in limbo and will probably be for the forseeable future.
> I think each language made the trade-off that made most sense for that language.
Definitely! Go is meant for backend application logic computing where you can provision tons of ram and ignore the issues of gc. Rust targets a larger domain, less application logic in particular but the whole range of system programming. Also including applications but also low level libraries, places without an OS, etc. I think if Rust really wants to be low-low level, then not shipping an async runtime is a must, even if the std crate is present. Providing features for libraries to support multiple runtimes? Sure. But don't apply solutions that (mostly) work for Go to Rust's problem domain.
> Wasm has been really great at shipping the MVP, but they are pretty slow about shipping the many features that build on it. In general, this makes sense as the system can't be changed once its stable. But it also means that a lot of things are still in limbo and will probably be for the forseeable future.
I concede that development of post-MVP development has been slow, but I am also optimistic for the near future based on recent progress. Are there particular things in limbo that you're particularly interested in or concerned about?
The linked message is just saying that implementing (green) threads on top of a non-threaded WASM spec has overhead. It doesn't really have anything to do with async, or the multiplicity of async runtimes, as such.
I agree with much of this post. I managed to avoid async Rust for three years of writing it. I do think it's the least beautiful part of Rust. My journey has been one of reaching for Arc and Mutexes and then running into problems with that approach. Relying more on channels and spawned tasks that own state i.e. Actors[0] has been a good improvement.
I do think the post is a bit unfair in this sense, it rightly identifies the problems of Send and 'static. However, it also suggests Arc and Mutex are *the* solutions for shared state in async, but suggested channels for the threads example.
The problem of function colouring and Send/'static bounds are the significant hurdles with async Rust, shared state is something that needs to be resolved whether using threads or async.
The few projects I wrote using async Rust eventually became unmaintainable. And when things go wrong, stack traces involving Futures are impossible to understand.
This is where Go really shines. Goroutines may not be "right" or "good", but they are very intuitive, and maintainable. Performance isn't bad either.
In Rust, there's the May project that is very similar and should really get more attention.
Here’s the problem about languages like Rust, at the very beginning of rust goals it gives you all the control while offering security and performance, want some libraries to manage some these authorities no problem, but the problem here is that if everyone want to agree one thing or feature while the language gives the programmer to full control this causes fractions in the ecosystem, 3rd party vs rust core team issues and co, a single unmaintained library can deal a massive blow to the ecosystem on like Go where “The Language makes the decision for you” it gets worst as rust isn’t a domain specific language (way more than python or java) even tho it’s a systems language this brings in different domain ideologies into the language which in turn creates massive 3rd party libraries to be able to handle those ideologies which in turn causes a massive blow to the ecosystem if more unmaintained libraries pile up.
This circle will repeat its self over and over again
Fantastic article. My experience with Rust as an enthusiast was that tutorials tend to introduce Tokio very early on and it kinda makes Rust feel more difficult than it is. Rust's async shouldn't be taught, it should rather be discovered.
The author mentions
> If async is truly indispensable, consider isolating your async code from the rest of your application
I think ALL async code should be generally isolated.
Are there languages that provide foundational priority to asynchronous code yet supports good intermingling of sync and async in the same codebase? I maybe missing the point about isolation, but the mix of sync and async code gets bad really quick.
Go doesn't have native async support per se, but its approach to concurrency with goroutines and channels simplifies the process considerably. Synchronous code resembles asynchronous code, eliminating the need to isolate goroutines.
Rust, on the other hand, took a different route. Green threads don't integrate smoothly with code interfacing through FFI. Moreover, Rust's async model doesn't require a garbage collector.
With HTTP requests, you will come across reqwest and Tokio. This comment [0] introduces me to ureq and the commentor helped me to explore Rust better.
I understand that making async HTTP request is a fundamental concept. However, I question why we should recommend a more complex solution when there are simpler alternatives that still leverage Rust's capabilities.
All I want is a basic tiny single threaded async runtime in std. No need for Send & 'static on everything. A modern single core is more than plenty for my workloads. Need more horsepower? Sure grab Tokio. I'll be fine with single async thread for IO and Rayon threadpools for heavy compute. No need to over complicate stuff.
One of the common comments in this thread is "can't we just make standard interfaces in std?" "well, no, sync+send is hard"
I can't help but wonder if there are two sets of interfaces necessary? a set of standard single-threaded traits and a set of multi-threaded traits? would that be sufficient?
As an aside, what workloads require true multithreaded reactors as opposed to a runtime which uses multiple singlethreaded reactors?
> As an aside, what workloads require true multithreaded reactors as opposed to a runtime which uses multiple singlethreaded reactors?
For example, DataFusion spreads query processing over multiple cores by having the data flow be a "streaming DAG of record batches" (or something like that), as in futures::Stream.
I don’t write rust in any sort of large capacity, but async in rust gives me this sinking feeling that the project took a big misstep that’s going to either be permanently bad, or very painful to fix.
I’m aware that the issues are tough to work through but it’s a real shame that async traits remain in nightly. On top of this, being able to reference a set of reasonable traits from a popular library not linked to a runtime would make library writing less (runtime) siloed. For example, a library author would not have to expose their own async Read and Write traits allowing consumers of that library to use runtimes that consume those traits. The user would not then have to do the plumbing themselves.
This article isn't covering the lack of structured concurrency and the blocking dependency on async Drop. The resulting state of async Rust includes leaks and inadequate task management. This isn't a Tokio problem but a Rust problem, and one that doesn't seem to have an answer after years of deliberation.
> The Original Sin of Rust async programming is making it multi-threaded by default
So, obviously having to sprinkle Arc and Mutex all over the place sucks as a developer experience. But how much does that really impose in terms of runtime overhead? Both of those structures tend to perform decently in the single-threaded case. It's obviously useless work, but I'd be surprised if that shows up on any profiles as a bottleneck.
Also important to note that while you might clone an Arc in some places (like at the beginning of a request) you can almost always just use `.as_ref()` to take a 0-cost reference to the value, thanks to borrow checking.
So I know _nothing_ about Rust, but I know Tasks in C# and one of the most important concepts is that a variable can be “async local” i.e. local to the async “thread”. So I wonder if the problem is that rust doesn’t have a lifetime specific to an “async thread”. As long as it’s well understood that the variables don’t leave that context, I think everything is easier to reason about.
You can move a variable into a task, or even borrow it across the task, and things work perfectly fine. Frankly, this article is blowing things massively out of proportion.
As I say, I really don’t understand and this probably isn’t the place to educate me, but this isn’t the only article I’ve seen that regards Send + ‘static as a) mandatory and b) problematic.
For sure, some people definitely think this is problematic. But I wonder how much of that is "this is problematic to my philosophy of programming" versus "this actually slows me down when writing code".
Not sure if trolling but, Rust had green threads before Rust 1.0. Like everyone sane before them they figured N:M scheduling is not the way forward and they ripped it out before going stable. That decision impressed me a lot and made me look into the language, I'm loving the journey so far.
An async runtime should have been a core part of the language from day one. Yes, make it user replaceable if you want (like the allocator) but one must be provided by the standard library. No buts.
The tl;dr is that I think this entire async concern stuff is ridiculously overblown. I suspect the vast majority of Rust devs, like myself, not only accept the current state as "fine" (couple things to work on) but are very happy with the decisions tokio made.
Things like "oh no you have to use Arc" are really acting like that's more than a trivial change. Or "you have to use Mutex" when you don't. Or "you can accidentally block" as if that's not the case with regular threads too, and in that case I acknowledge that async can make things trickier there. Or "tokio is so popular" as if that's not exactly because of the decisions it made early on that appealed to rust developers.
Sorry but it's just not that big of a deal. The warts I run into are in cases where I'm trying to do stuff like zero copy, async, abstracted deserialization. That can be a pain right now (and is being worked on). 99% of the time it's a matter of just writing `async` or `await` and not worrying about anything. In fact, I almost never use `tokio::spawn` anyways except at the binary level - these problems virtually do not impact me.
Source: I have written 10s of thousands of lines of async Rust, probably 100s of thousands.
I'm confused, what are you suggesting? That Rust hits a global sweet spot already, the complainers are struggling because they are holding it wrong, and there shouldn't be an attempt to change anything?
I think I was clear - that the critiques are overblown and that these problems aren't nearly as significant as portrayed. Sure, there may be some "holding it wrong" going on, idk. A couple of blog posts aren't really representative of the overall feeling from devs like myself - that things are more or less fine.
As for changing things, I wouldn't really change any issues brought up here. I'd like to see some things smoothed out, like async traits, tooling to identify hot loops that are blocking without yielding, and things like that. Otherwise, nope, working as intended as far as I'm concerned.
Not really related to this article in particular, but I keep reading about "oxidize this" and "corrode that", which makes zero sense for a language named after a fungus (https://en.wikipedia.org/wiki/Rust_(programming_language)#Or.... Ok, the proper verb for a parasitic fungus would probably be "infect", and nobody wants to go there, so I guess they just pretend that it's named after iron oxide?
Yeah, it's just like Python, which was named after Monty Python, but for some reason their logo is a pair of snakes. What does that have to do with a British comedy troupe?
The rust fungus gets it's name because it's colored like iron oxide - that is like the plants are rusting. If the fungi's name itself is a metaphor for corrosion/oxidation, why would it be improper to honor that theme?
> An inconvenient truth about async Rust is that libraries still need to be written against individual runtimes.
That's really the heart of it. If it was really just a runtime, it wouldn't matter what implementation you plugged in.
...but it's not true for the rust runtime; I mean, it's understandable, how can you have one runtime that is multi-threaded and one runtime that is not, and expect to be able to seamlessly interchange them?
I understand it's hard and lot of work went into this, but let's face this. This article is right:
Practically speaking, tokio has become 'the' rust async runtime; but it's an opinionated runtime, that has a life cycle and direction outside of the core rust team.
That wasn't where we intended to end up, and it's not a good place for things to be. I, at least, agree: avoid async. Avoid teaching rust using async. When you need to use it, partition off the async components as best you can. I <3 rust and I use it a lot, but the async story stinks.
We should have an official runtime, officially managed, and guided by the same thoughts that guide the rest of the language.
What we have now is a circus. After 4 years of async being in stable.