Async Rust: What is a runtime? how Tokio works under the hood

armchairhacker · on July 16, 2022

Basically, async Rust requires you to understand how “async” works under the hood. Javascript async does stuff automatically, there is an implicit runtime and you can just await or .then and get a Promise and closure which encapsulates whatever data you need and stores it on the heap.

Rust lets you optimize the runtime and do polling etc. how you want. But you have to do everything explicitly. And you can store futures on the stack and customize how their represented (including the closures), but you have problems like async traits because every future has a different type.

Lots of people say “async Rust is hard because async is hard”. Honestly this is false, async is easy if you Box dyn everything and use the tokio runtime. Async Rust is hard - and Rust in general is hard - because they don’t compromise performance for abstraction and encapsulation. You get exposed to the gritty internals of how async works in exchange for being able to make futures and a runtime which are optimal for your specific program.

bkolobara · on July 16, 2022

> Async Rust is hard - and Rust in general is hard - because they don’t compromise performance for abstraction and encapsulation.

I love Rust, but this is not completely true. Async Rust is hard because many nice features from Rust are not available in async Rust. For example, you can't have async functions in traits and always need to box return value, compromising performance just because the language is not powerful enough yet to support this.

This also leads to not having standard Read and Write traits and a bunch of fragmentation between runtimes. It's not that having it would compromise performance, it would even allow better performance than the current workarounds, but async Rust is still in development and needs time to catch up. In the meantime you need to do a bunch of awkward tradeoffs between ergonomics and performance when writing async Rust and of course it's frustrating to developers that are used to Rust being zero-cost with awesome ergonomics.

gloryjulio · on July 16, 2022

This is very similar to c++. After transitioning to coroutine from previously semicomplete implementations like promises, our codebase become so much better. It just takes time to reach a better feature

mrfox321 · on July 16, 2022

what c++ coro library are you using, folly?

i use that and it makes async programming trivial, relative to callback style programming.

gloryjulio · on July 17, 2022

Yeah we use folly and we acknowledge that it's not the 'most' performant option. But we we don't have real time constraint. It's good enough

moonchrome · on July 16, 2022

JavaScript also has the benefit of a single threaded event loop - that eliminates a whole class of complexity and dealing with schedulers etc. - there is only one.

Async already gets messier in C# for example which also has a GC.

armchairhacker · on July 16, 2022

You can have a single threaded event loop in Rust. Just, like everything else, it's an option and you have to specify it explicitly.

moonchrome · on July 16, 2022

Yes but the API has to be built for multithreaded scheduling.

Gwypaas · on July 16, 2022

This is a nice read on why you likely should prefer a single threaded runtime and enable parallelism through other means. Or decide to pay the synchronization cost.

https://maciej.codes/2022-06-09-local-async.html

staticassertion · on July 16, 2022

No, you get !Send futures.

dagmx · on July 16, 2022

Yes but the point was that rust has to allow for multi threaded event loops. JavaScript can design around knowing that's not an option

ashish_negi_ · on July 16, 2022

if you write harder concurrent program in JS, single threaded event loop does not help. It still has also same concurrency problems. e.g. if there is shared global mutable state: a) in workflow 1 you have, do_some_work_on_global_state; do some IO; in IO callback, finish more do_some_work_on_global_state. b) in workflow 2 .. same like above work on global_state.

Now, in IO callback, you don't know if workflow 2 ran and have to handle all possible combinations of global_state above.

Replace global_state with common_state and problem still remains.

If you don't have common_state between multiple workflows, then it is not a hard concurrency problem and should be easy to do in all languages.

moonchrome · on July 16, 2022

Sure it does, you don't have to manage on which thread you handle continuations (which you must in multithreaded GUI for example) - there's only one scheduler which simplifies the async API a lot.

But even for concurrency - single threaded event loop/cooperative multitasking eliminates a whole class of partial state updates and synchronization primitive/locking errors - it's not even close to preemptive multitasking complexity.

staticassertion · on July 16, 2022

Rust does not require you to know how async works under the hood.

Javascript async doing things automatically has been infinitely more confusing for me, frankly. Async rust isn't hard, rust isn't hard - not for a lot of people at least.

I don't know what gritty details you're referring to, you need to know the same rules you always know - move semantics, some concept of lifetimes maybe. Move of the time it's "add a `move` and clone before the async block".

thinkharderdev · on July 16, 2022

In general I agree that the difficulty and need to know the "gritty details" is overstated, but there is one aspect where that is true and I have found it confusing at times. Since it has to capture anything in scope of an await point, you will sometimes get somewhat non-obvious compiler errors about how "X is not Send" when it's not really obvious at all why it would need to be. So something like

``` let locked = std::sync::RwLock<Foo> = ...;

let lock = locked.write().unwrap();

bar.doSomethingAsync().await ```

will complain because `std::sync::RwLockWriteGuard` is not send. Just looking at the code it is not really clear why it should need to be. To understand why, you need to understand how the compiler transforms this code into a state machine and capture everything in scope of an await point in Struct that must be send (since it can shift to new thread when resuming). It makes sense when you understand what's happening under the hood but can be a bit baffling when you are starting out.

justinpombrio · on July 16, 2022

That is tricky. It's also something you need to know when working with closures in Rust, which are for the same reason much harder to work with and understand than closures in other languages. I wonder whether it would have been better design for Rust closures to require an explicit capture list, like in C++, just to be more explicit about what is happening. (Not sure if/how that would translate to `await`.)

thinkharderdev · on July 17, 2022

Yeah, I've been stung more than once by the `Fn/FnOnce` distinction

Groxx · on July 16, 2022

Fine-grained logic in async JavaScript can be a very special kind of pain. It's a rather specialized event loop, but the vast majority of articles treat it like "oh it's just a normal in-order event loop like every other".

It ain't. Unless your logic has no order requirements between async components, or explicitly accounts for things like "microtasks", there's a chance it's wrong... and it depends on your runtime: https://bytefish.medium.com/the-execution-order-of-asynchron...

(it's generally better to not depend on execution order in async systems anyway, but it's rather easy for it to sneak in sometimes. if it does, it may work on your machine but not on mine, or it might change based on what kinds of tasks other code spawns, if you press a button at a critical moment, etc)

amelius · on July 16, 2022

As long as they've solved the "what color is your function" ([1]) problem, I'm happy!

[1] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

mrits · on July 16, 2022

I made a fairly complex app using tokio and async. I did not know how async worked in rust at the time. I didn't even know entirely why I needed tokio.

mattrighetti · on July 16, 2022

This is a heated and frequent debate even in the rust community itself: you almost always don't want/need async

mrits · on July 17, 2022

Not an argument unique to the Rust community.

kelnos · on July 18, 2022

> But you have to do everything explicitly.

Sure, but in the common case, "everything" is just adding a simple attribute to your main() function (e.g. `#[tokyo::main]`), and adding the `async` keyword to functions that have stuff in them that need `.await`ing. That's... kinda it?

The only real difficulty I've run into is when I have multiple futures I need to wait on, since there are some fiddly bits to deal with (like using select!{} can cause you to lose data depending on what the underlying futures are doing).

Regardless, comparing Rust to Javascript is a bit weird; they are just not comparable languages with even remotely similar intended use cases.

pkulak · on July 16, 2022

People crap on async Rust because it's not the most graceful to use, but I think it's kinda genius how they've managed to make it zero-overhead. To the point that even the stack size of green "threads" is known ahead of time.

The only issue I have is that it's tough not to use it. The big HTTP libraries let you opt out, but smaller libraries don't have the resources to do everything twice. I don't know what the solution is, but it would be nice to always be able to chose. It's pretty silly to use async networking in a cli app, for example, but sometimes you have to.

duped · on July 16, 2022

> To the point that even the stack size of green "threads" is known ahead of time.

This is only possible by not having stacks for those "green threads" (they're stackless coroutines, not stackful fibers/proper green threads).

It puts a severe limitation on the usefulness and forces any kind of recursive coroutine to heap allocate on returns.

JoshTriplett · on July 16, 2022

One other obvious alternative would have been to make all futures heap-allocate so that some can be recursive.

I think making recursive futures invoke `Box` or similar seems consistent with Rust's general "only pay for what you actually use".

duped · on July 16, 2022

I disagree, because you're paying for boxing on return + dynamic dispatch when it's perfectly safe/sound to have a stackful coroutine that only allocates when the stack needs to grow and doesn't require dynamic dispatch. So you don't pay for heap allocated futures if they aren't necessary, and you pay less when they are.

So you're actually paying a higher price than if the compiler could support stackful generators for recursive futures. In fact async just gets a lot easier to write and use if generators could be stackful, at least imho. Generators don't have the syntax nice-ness of async/await, but they're also more explicit which feels more in with the rest of Rust, where syntax sugar like async/.await() is the exception and not the norm.

JoshTriplett · on July 16, 2022

Fair enough, if we could create generators and futures that switch stacks that might work.

veber-alex · on July 17, 2022

stackful coroutines break FFI

duped · on July 17, 2022

How so? You can swap stacks in posix just fine modulo some setjmp/longjmp shenanigans. But exceptions aren't sound across FFI either.

ReactiveJelly · on July 16, 2022

> The big HTTP libraries let you opt out (of async)

At least for reqwest it just wraps the async code in something like `block_on`: https://github.com/seanmonstar/reqwest/blob/5397d2cf8eaecc9f...

And as the sibling comment says, you can do that in your own code. It does still add Tokio to your binary size, and add some compile time, and probably start a bunch of worker threads you don't need, but it does work.

CryZe · on July 16, 2022

You can always just do let no_async = block_on(something_async);

kangalioo · on July 16, 2022

I'd instinctively worry about overhead there

Are there benchmarks comparing sync http libraries with block_on-wrapped async http libraries?

ibraheemdev · on July 16, 2022

> I'd instinctively worry about overhead there

Yeah, one of the nice things about blocking I/O is that you can perform it with a single syscall. With block_on(async_io), you're now dealing with registration with a reactor, polling epoll, and extra syscalls for each I/O operation. Not to mention the overhead of running the state-machine as opposed to line by line.

bongobingo1 · on July 16, 2022

Is the overhead going to matter if your content to block anyway? Surely any threading etc constructs are tiny compared to wire time, etc.

ibraheemdev · on July 16, 2022

Blocking I/O+threads can actually scale very well now, and with block_on you get the worst of both worlds, but yeah, I agree that most people are probably fine with it.

vbezhenar · on July 16, 2022

Most people fine with python. Those who come to Rust are not fine with overhead.

mcronce · on July 16, 2022

This assumes that people only use Rust for the performance. I don't think that's strictly true.

95% of what I write isn't performance-critical or even, really, performance-relevant. I still choose Rust for the vast majority of projects for ergonomic and correctness reasons.

adwn · on July 16, 2022

You're spot on. I recently chose Rust over Python for a very small program which reads a JSON (or Hjson) file, does some checking and processing, and writes the results to a different JSON file, because Rust has serde, proper static type checking, algebraic data types, and other features that made it more productive than Python (!!!) for that specific use case. Performance wasn't even a consideration, I made judicious use of clone() and run it in debug mode.

dagmx · on July 16, 2022

Serde is such an awesome library. Having to decode serialized data in the other languages (Swift, C++, Python) I write is such a bear after using serde.

Swift comes closest with codable/decodable but it's often still lacking the ergonomics of serde, especially the attribute options per field

dagmx · on July 16, 2022

Some people who come to rest may not be fine with it.

A lot of people come to rust for other reasons, like compile time safety checks etc...

But most importantly, the delta between the performance of block_on and not, versus block_on and Python are massively different. You can write inefficient rust and still have a huge win over Python.

ThePhysicist · on July 16, 2022

I recently got started with Rust and was surprised that different async runtimes were not compatible with each other. In principle Rust has a similar interface concept like Golang, so it should be possible to specify desired behavior of a component and leave implementation to the library, so that you can switch between different ones without worrying about compatibility. Pretty much a Rust noob still so maybe I'm missing something that makes this difficult/impossible though.

I've been thinking about rewriting a network library using async, but the whole async ecosystem seems a bit fragmented and immature: mio would probably be everything I need but it doesn't support channels (there's mio-extras which does but it's not compatible with the latest mio version). Tokio would probably fit the bill, though it seems to be too complex for what I actually need (just a way to poll sockets and channels to see if there's anything to read from them).

nicoburns · on July 16, 2022

> Pretty much a Rust noob still so maybe I'm missing something that makes this difficult/impossible though.

It's largely just because the various library authors have not managed to agree on interface definitions. I think it'll get sorted eventually, but unfortunately doesn't seem to be a big priority for the runtime developers. It's also partially blocked on async function being available in traits, which isn't currently possible in Rust without workarounds (which wouldn't be suitable for the standard library).

> Tokio would probably fit the bill, though it seems to be too complex for what I actually need

Tokio is probably what you want. It might be complex under the hood, but it ought to fairly straightforward to write networking code using it (I believe polling a channel is typically as simple as calling `.recv().await` in a loop within an async function).

laerus · on July 16, 2022

This is not unique to Rust. The Python async runtimes are not compatible with each other as well, there is AnyIO which is a wrapper that acts as an abstraction that makes things easier but it's still not as implicit as with languages with builtin runtimes.

There is also an ongoing effort to make Rust async runtimes pluggable.

samsquire · on July 16, 2022

The behind the details of Rust async is rather hard to follow There's waiters and polling that only execute your function to progress and there's pending and done. Any help to understand the relationship between the waiters, pollers and executor and runtime things would be appreciated.

I wrote a M:N thread scheduler in C, Java and Rust. The C version also can schedule file reading to an IO thread but I'm nowhere near finished.

https://github.com/samsquire/preemptible-thread

Another of my ideas is to rewrite synchronous code into parallel LMAX disruptors. In other words a tree of RingBuffer each line of synchronous code its own event loop. Rather than one event loop multiplexing events from different systems you pipeline every blocking call. I think it would be very fast.

Here's a write-up.

https://github.com/samsquire/ideas4#51-rewrite-synchronous-c...

redman25 · on July 16, 2022

Is it possible to build a tokio compatible library that might not be so heavy weight? Maybe this a moot point since libraries would use tokio as a dependency anyway.

tempest_ · on July 16, 2022

Smol is essentially that iirc.

https://github.com/smol-rs/smol

Though I have not used it extensively.

carllerche · on July 16, 2022

Can you elaborate on "heavy weight"? Tokio lets you opt-in to only what you need via feature flags. This lets you use a small subset of the lines of code & transitive deps.

saurik · on July 16, 2022

So, as someone who has been working heavily with coroutines and continuations for decades in a number of different languages across the gamut of programming paradigms, I don't really understand why these runtimes aren't "interoperable", and am hoping I just have a different idea of what that word means than the people who talk about them in the context of Rust.

Like, right now I maintain a large almost-entirely-asynchronous C++ codebase using their new C++20 co_await monstrosity, and while I find the abstraction ridiculously wide and a bit obtuse, I have never had trouble "interoperating" different "runtimes" and I am not even sure how one could screw it up in a way to break that... unless maybe these "executors" are some attempt to build some kind of pseudo-thread, but I guess I just feel like that's so "amateur hour" that I would hope Rust didn't do that (right?).

So, let's say you are executing inside of a coroutine (context is unspecified as it doesn't matter). When this coroutine ends it will transfer control to a continuation it was given. It now wants to block on a socket, maybe managed by Runtime A (say, Boost ASIO). That involves giving a continuation of this coroutine past the point of the transfer of control to Runtime A which will be executed by Runtime A.

Now, after Runtime A calls me--maybe on some background I/O thread--I decide I would prefer y task to be executing in Runtime B. I do this sometimes because I might have a bit of computation to do but I don't want to block an I/O thread so I would prefer to be executing inside of a thread pool designed for slow background execution.

In this case, I simply await Runtime B (which in this case happens to be my lightweight queue scheduler). I don't use any special syntax for this because all of these runtimes fully interoperate: I used await to wait for the socket operation and now I use await to wait until I can be scheduled. The way these control transfers work is also identical: I pass a continuation of myself after the point of the await to the scheduler which will call it when I can be scheduled.

Now remember, at the beginning of this I was noting that something unspecified had called me. That is ostensibly a Runtime C here (maybe I was waiting for a callback from libwebrtc--which maintains its own runloop--because I asked it to update some ICE parameter, which it does asynchronously). It doesn't matter what it was, because now that "already happened": that event occurred and the continuation I provided was already executed and has long since completed and returned as I went on immediately to pass a continuation to someone else rather than blocking.

Is this somehow not how Rust works? Is await some kind of magic "sticky" mechanism that requires the rest of this execution happen in the context of the "same" runtime which is executing the current task? I have seen people try to do that--I am looking at you, Facebook Folly--but, in my experience, attempts to do that are painfully slow as they require extra state and cause the moral equivalent of a heavyweight context switch for every call as you drag in a scheduler in places where you didn't need a scheduler.

But, even when people do that, I have still never had an issue making them interoperate with other runtimes, so that can't be the issue at its core. I guess I should stare at the key place where the wording in this article just feels weird?... to me, I/O and computation are fairly disjoint, and so I can't imagine why you would ever want to have your I/O scheduler do "double-duty" to also handle "task queues". When I/O completes it completes: that doesn't involve a "queue". If you want to be part of a queue, you can await a queue slot. But it sounds like tokio is doing both? Why?

dagmx · on July 16, 2022

You can interchange async implementations in rust if you like, much like you can in C++ or other languages.

What becomes hard though is grappling with what that means:

- the stdlib doesn't know about async, so there are a variety of async stdlibs that may or may not be tightly coupled to an implementation.

- different runtimes may choose different threading models. Some may be single threaded-ish, some may be across threads. You could treat it all like it's across threads, but this does mean that there's another detail you need to consider when you're setting up your data.

- Io scheduling mixed with task scheduling is a choice of how an async stdlib is configured. There's advantages to having them coupled in that the runtime can sort checks on returns on the Io call as it cycles through the tasks, or put them all on a single thread queue etc... There's lots of patterns here that may have their own individual tradeoffs

nextaccountic · on July 16, 2022

They are interoperable in the most basic mechanism of futures: every executor can spawn tasks composed of any futures (just like co_await in C++ is interoperable)

But they aren't interoperable in practice because they offer different APIs

In some cases this is fixable (for example, the rust ecosystem needs some to standardize some async abstractions because currently every executor defines their own trait for async reading for example), in other cases it represents a genuine limitation of a given executor (for example, some embedded executors can only spawn a single task, and you achieve concurrency by using future combinators)

saurik · on July 16, 2022

OK, so the version of "interoperable" you seem to be using sounds like like "swappable", which isn't really a property I have ever cared much about. Like, if I have code that is using ASIO's task abstraction and other code using cppcoro's and other code using my own scheduler and still other code wired up over some callback setup, I would have just used "interoperable" to mean I can await whatever I want whenever I want without complex glue code, as--at the end of the day--I am merely passing a continuation for my function to someone who will call it later. I mean, of course the APIs aren't the same: in one case I am awaiting sockets and in other case I am awaiting queue slots and in another case I am awaiting random asynchronous events but I am able to do all of it from a single asynchronous function as they are all "interoperable". It just sounds from these articles that Rust can't even do that.

nextaccountic · on July 18, 2022

No, Rust is fully interoperable in the sense you care about. But in the Rust ecosystem there's a desire for writing libraries that can run in any executor, to avoid picking a winner.

Right now what most libraries do is to write code paths for working with tokio, with async-std, etc. This is not sustainable. If we had generic APIs we could just code against that.

Anyway, the biggest source of contention is that the networking API of Tokio and async-std are different. But there's no fundamental reason for this difference and there's hope that it will eventually be possible to bring a common API to the stdlib

saurik · on July 18, 2022

FWIW, the word "interoperate" fundamentally -- just taking it as inter- -operate -- means separate things being able to work together. If you can replace one thing for another thing they aren't "interoperable", they are "interchangeable".

Regardless, everyone else in this thread -- including people who seem to know what they are talking about -- seem to be defending the other normal usage of the word by talking about supposed issues with running multiple executors at once and bouncing between them.

Are you sure I can have a single async function which can in one statement await tokio and in the very next statement of that very same function await async-std without having to jump through some gnarly hoops?

https://news.ycombinator.com/item?id=24675155

^ Here is someone -- though from like two years ago -- asking this very specific narrow question and getting back a number of responses that claim this isn't possible (and so these systems are not only not interchangeable but also not interoperable).

(That said, there is one person on that thread who disagrees, but other people seem to disagree with them and the only link to any documentation provided -- but which was notably from someone else and so might simply have been the wrong reference -- is about a bunch of third-party glue.)

jalk · on July 16, 2022

I think the issue in question is more mundane - if someone publishes an async database client library, its currently hardwired to a specific async runtime, so you cannot not easily use it if you are not already using that runtime. The common async abstraction being worked on, sets out to solve that.

saurik · on July 16, 2022

I mean, I would hope "easily" would happen because I can always just use two async runtimes... if they were "interoperable". I can quite easily have a number of separate I/O abstractions and schedulers all happening at the same time in C++, for example, and I never think much about it: I just co_await and it, well, waits.

lvass · on July 16, 2022

>hardwired to a specific async runtime

With version constraints, right? IIRC you can end up with multiple versions of multiple async runtimes in a project. I think it'd be better to only have a single one hardwired to the compiler like python's asyncio, even if it likewise sucked.

craftkiller · on July 16, 2022

Rust libraries can have implementations for each async runtime and then you can pick between them using features. For example, when using sqlx with tokio I would have this in my Cargo.toml:

    [dependencies]
    sqlx = { version = "0.5", features = [ "runtime-tokio-rustls", "sqlite", "migrate" ] }

But I also could use async-std with:

    [dependencies]
    sqlx = { version = "0.5", features = [ "runtime-async-std-rustls", "sqlite", "migrate" ] }

So you should be able to get all your deps on a single runtime.

nicoburns · on July 16, 2022

In practice both major runtimes have long-term stability guarantees (e.g. tokio has committed to maintained 1.0 for at least 5 years), so if you use libraries compatible with Tokio 1.0 then you're unlikely to have issues with this for some time.

howinteresting · on July 17, 2022

Tokio 1 is the only async runtime used in production at scale, there's very little reason to use anything else. So you can seek out libraries that use tokio 1 and ignore anything else.

WaxProlix · on July 16, 2022

I haven't used much Python recently, but iirc you can just import your own runtimes, too. Twisted, gevent, that sort of thing. Having some sort of sane defaults bundled gives you a really nice baseline for interop, but doesn't preclude you from picking things that fit your use case better.

Definitely feels like one place where Rust kinda dropped the ball, at least from a user perspective in $CURRENTYEAR.

lvass · on July 16, 2022

True, what I meant is that asyncio is part of the interpreter so you are very unlikely to have trouble with incompatible versions of asyncio. Twisted and gevent don't use async/await, but there's Trio which does and is saner than asyncio, but thankfully library authors aren't forcing it's usage. It's also possible to write libraries that use async but you bring your own runtime (trio or asyncio) with AnyIO.

aliceryhl · on July 17, 2022

It's not possible to have multiple 1.x.y versions of the same crate in your project, so you would need a really old library that depends on Tokio 0.2.x for that to happen. This isn't something that normally comes up in practice.

haradion · on July 16, 2022

You might think of Rust's async paradigm as "half a continuation, turned upside down". With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe. Most languages with continuations manage this by "pausing" your function and keeping its stack frame around, which, in the general case, means your function's stack frame has to be heap-allocated, which is basically the language itself giving you a "pseudo-thread". You eventually get control back with the same stack frame, and as far as the language is concerned, how you get back there is none of your concern; that's its job.

In Rust's polling-based model, there's no "magic" saving of stack frames. You get some space to store state, but the runtime has to manage that memory itself. You can use the language to express "this is the next thing to call", but when you spawn an async I/O task and yield to it, you've already returned from your own function to the runtime, and it's the runtime's job to call your function again with the state it had stashed away. You then jump over the steps in your function that have already been handled and call into the next thing. It gets a bit more involved due to various bits of syntactic sugar, but that's the basic model. It's operating at a lower level of abstraction than many languages' coroutines or call/cc, which gives you the flexibility to customize the behavior to meet specific needs.

A runtime for generic desktop/server apps may maintain a thread pool and call back into your code on one of those threads. In WebAssembly, execution is single-threaded, but JavaScript promises may call into your runtime, and you have to dispatch that to the right Rust future. On embedded platforms, the data structures that the desktop/server runtime uses may simply not be suitable (e.g. because you have no general-purpose heap allocator), so you need to use a different approach with more constraints.

Interoperability between these runtime is possible. The key is that you need a task that's running on one runtime to be able to spawn a task on the other, with part of that task's job being to notify the first runtime that it's time to poll the "parent" task again. The mechanics vary depending on how each runtime handles task spawning.

As I understand it (from having skimmed some articles a while back), C++'s co_await isn't really all that different. Since we don't have the executors proposal as part of the standard yet, it's still a "bring-your-own runtime" sort of approach, with some kind of glue required at the boundaries between runtimes. Depending on which "flavor" of C++ coroutines you're using (e.g. push-based vs. pull-based), that interop might be easier than Rust's at the cost of other tradeoffs (e.g. more heap allocations).

saurik · on July 17, 2022

> With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe.

I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".

Instead, in a "traditional" coroutine, a continuation-passing transform is implemented in the compiler that changes -- in the best case of having this wrapped up in a Monad (which Rust could really use support for right about now) -- "do A and then B" into "do A while telling A to call the continuation of B when it is done, and otherwise immediately return". B isn't a "runtime" and isn't the "language"; you could argue B is an "executor" but it is unique to every call.

So if you want a no-op A it would be "call the continuation it is passed, immediately". This would result in behavior identical to the original synchronous function: we call A, which does whatever it wanted to do (in this case nothing) and then it chains through to B". As the call to the continuation is in tail position for this case, the resulting behavior should work out to being nearly identical (like the CPU won't be able to branch predict this as efficiently, but it will have similar overhead).

In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.

This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.

Like: just writing normal synchronous code also involves heap allocations as you have to allocate the stack space for the next frame every call. You can elide that in many cases by pre-allocating a bunch of memory for the stack, but a sufficiently-deep call stack will overflow the memory you allocated and break in some potentially-catastrophic manner. It is a fiction that you can write essentially anything of consequence without either heap allocations or some fuzzy understanding by the developer of how hard they can push it until it breaks.

haradion · on July 21, 2022

> I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".

Syntactically, many languages represent the operation of calling into the next continuation as a regular return (for green threads) or a regular function call (call/cc), but there's always some degree of runtime magic involved in the generated code. For instance, rather than just incrementing or decrementing the stack pointer, you've got to potentially set it to point into a totally different runtime-allocated stack. In principle, that can probably be implemented as just special-case code generation rather than an actual call into the runtime's routines, but that still leaves the need to clean up the current task's stack after it returns (or does a tail call into another stack), which will be either an explicit runtime call or rely on the runtime's garbage collector.

The real magic, though, isn't so much in the user-written continuations as it is on "async blocking" calls for things like I/O.

> In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.

This is precisely what Rust's async runtime libraries are. They provide the event loop/callback mechanisms, which are necessary for truly async code. (Otherwise, what is there to wait for?) You can totally write and call an async function in Rust that doesn't use a runtime, but there's no way for it to "asynchronously block"; you'd just poll it and get back, "Yep, I'm done; here's my result."

> This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.

It's not just the depth of the stack; it's that, once you yield, another task may take over the thread's flow control entirely. Let's say that function A spawns a coroutine B without waiting for it to finish. Now let's imagine that B allocates space on the same thread stack that A was using (on top of A's stack frame) and then yields. At the yield point, something (e.g. the runtime) has to say, "OK, B is stuck, so what do we run next on the thread?" Eventually, it's A's turn to finish running, and it returns. If it does this naïvely, it'll rewind the stack pointer, dropping the stack frames for both A and B. But B isn't done running yet; it's just blocked, so now we've got a problem because its stack just got clobbered. To avoid this, languages that use "stackful" coroutines have to allocate coroutines' stacks on the heap in many cases because the traditional single-stack model isn't just running out of space; it totally breaks down.

Rust uses stackless coroutines, which impose some restrictions on how the coroutine is structured (mostly involving unbounded recursion) so that the state the task has to store between yield points has a fixed size.

haradion · on July 21, 2022

When you follow the restrictions on stackless coroutines, the coroutine can, as you mentioned, elide stack allocations for the "child" coroutines. If you want to make a call that can't have its stack allocation elided, you explicitly tell the runtime to spawn it as a top-level task.

aaaaaaaaaaab · on July 16, 2022

So, in other languages, async introduces two function "colors": sync and async. But in Rust you also have unique "colors" for each async runtime?

I must be misunderstanding something, because this sounds like pretty hare-brained...

ibraheemdev · on July 16, 2022

This is a consequence of runtimes relying on global variables that their core future types are dependent on. Creating abstractions to solve this problem is one of the main goals of the the async working group [0].

[0]: https://github.com/rust-lang/wg-async

stormbrew · on July 16, 2022

It would be really interesting to see an executor that didn't rely on global state, but instead had you manage an executor object for these things, and see if it's actually really all that bad. Everyone just kind of jumped straight on doing executors with global state and that design space is just completely unexplored.

It would help a lot with being able to discover and define what standard traits might be needed to make leaf async libraries more portable.

pornel · on July 16, 2022

The original "colors" article talked about two things:

1. Inability for sync functions to use async functions, which can have big consequences for API design and ability to refactor applications.

2. Author's opinion on how async syntax should (not) look like.

Rust can mitigate the first problem. It can have `block_on(async)` and `spawn_sync(fn)` which allows bridging between "colors" of functions, so functions aren't forever stuck with their "color". This is something that JS can't do without massive hacks, and is the objectively important aspect of "colors".

The other thing was about difference in calling syntax. That is just a design choice and a subjective preference. Rust prefers locally explicit syntax. It cares about low-level details and intentionally avoids having implicit magic, especially for major behaviors affecting control flow, safety of stack pointers, and risk of deadlocks.

Regarding runtimes: in practice it's easy to just stick to tokio. It's 8x more popular than the second contender, and there aren't any important libraries that don't work with tokio. Rust can have multiple runtimes in the same program. You can have futures running on different runtimes await each other, it's just wasteful (you get multiple event loops, thread pools, etc.), which means it's best to pick one and stick with it.

aaaaaaaaaaab · on July 17, 2022

>in practice it's easy to just stick to tokio. It's 8x more popular than the second contender, and there aren't any important libraries that don't work with tokio.

For now.

Python2.7 used to be 8x more popular than python3.

pornel · on July 17, 2022

That’s an empty speculation, and the analogy is flawed. Other runtimes aren’t tokio’s imposed replacement.

Runtimes create self reinforcing network effect, so it’s more likely that all other contenders will die out completely.

yazaddaruvala · on July 16, 2022

This is only temporary.

It’s because the standard library hasn’t exposed interfaces for everything in async land just yet.

It’s not a trivial problem, but it’s mostly a matter of the std-lib’s developers being overly defensive of std. Which is a hindrance for many things in the medium term, but in the long term likely a good choice.

Although I wish they’d find a better balance between the chicken and egg problem for “we need production usage of the interfaces to validate them as standard” vs “we need standard interfaces so it’s ergonomic to use them in production”.

lvass · on July 16, 2022

Async/await should be locked behind a compiler flag and not used by libraries that aren't experimental. Instead it already contaminated the entire ecosystem with to-be-deprecated dependencies that are hard to remove.

stormbrew · on July 16, 2022

Rust wouldn't be anywhere near where it is now if it had gated and blocked async like this. Maybe it would be 'better' in some abstract sense, but it'd be a better language almost no one would be using, like basically every other systems language that's attempted to do what rust is doing other than C++.

And if it were gated like this, we wouldn't even know about these kinds of problems, because no one would be using it and we wouldn't see them in practice.

If anything, rust does too much gating of features for too long. So many things have been sitting behind "needs stabilization" for years, living in a catch-22 of "we don't know if this is a good idea because no one uses it" vs. "no one uses it because using nightly is scary". I'm quite glad this one managed to escape that trap.

aaaaaaaaaaab · on July 16, 2022

Yep. I can’t help but see this as a python2.7 vs python3 story in the long run…

ben-schaaf · on July 16, 2022

You get the same thing in every language where you can choose a runtime. Whether the colors are enforced by the compiler is another matter, but since you can only have one runtime per thread and each runtime has a different API you inherently get runtime colors.

saurik · on July 17, 2022

This isn't actually a problem most of the time because we can spawn new threads, and so in C++ with co_await I have like three or four different "runtimes" all working at once with their own thread pools and their own I/O loops and I have no issue mixing and matching their behaviors using the obvious syntax. The issue with Rust seems to come down to something much deeper involving either an overzealous attempt to force people to think in terms of "executors" and "runtimes" for entire stacks of coroutines instead of individual tasks or (at best) as a result of decisions made with respect to how the memory management of the mechanism would operate in the context of Rust.

aaaaaaaaaaab · on July 16, 2022

>you can only have one runtime per thread

Why?

Why can’t a single thread run multiple runtimes’ event loops? It’s pretty limited what an event loop can do: it can wait on a set of FDs, or sleep for some time (or until woken up). So if the various runtimes’ event loops could implement a common interface, then I don’t see why they couldn’t share a single thread…

ben-schaaf · on July 16, 2022

> It’s pretty limited what an event loop can do: it can wait on a set of FDs, or sleep for some time (or until woken up).

But that's what the runtime is. There's multiple strategies for implementing an event loop (as well as multiple async platform APIs to use), which all directly affect the lower-level abstractions. By replacing the event loop and lower-level abstractions with one common implementation all you've done is add another competing runtime. You can't have more than one event loop and thus you can't have more than one runtime.

mcronce · on July 16, 2022

If you've got a thread running one event loop, how could you signal to it to temporarily break out of that loop and run a different event loop for a while?

aaaaaaaaaaab · on July 16, 2022

The while(1) part of the event loop would be outside, provided by the stdlib. Each runtime would provide a set of FDs that they want to monitor, and a function to execute a single iteration of their event loop. The while(1) loop would then be calling each runtime in a round-robin fashion, then go sleep while waiting on the union of all the runtimes’ FD sets. It would be sort of an “event loop of event loops”.

stormbrew · on July 16, 2022

What you've just described is ... an executor runtime. Even if you boil it down to doing "only" that, you still can only have one of that per thread, and there isn't just one way to implement that.

ReflectedImage · on July 16, 2022

More colors the better! Right?

swayvil · on July 16, 2022

Ever notice that "rust" is a contraction of "red dust"?

(Also, words are usually first coined as onomatopoeias. For example "dust" sounds like a swisshing pile of dust. But "red". That probably has more stuff going on there. Added layers)