The Monthly Oxide #3

A primer on Async Rust

May 01, 2021

Hello and welcome to another edition of The Monthly Oxide, a newsletter where you get to learn a bit more about Rust and get some links to some interesting articles and projects I found interesting this month. We’ll be covering some of what the history of async in Rust has been and touch on some of the lower level aspects of how it all works today. We’ll save the in depth explanation for another time, because it can get really into the weeds. This is a primer with some articles, RFCs, and PRs that you can look into for even more information. Let’s get started.

A Very Brief (Handwavy) History of Async Rust

A long time ago, in the years before 1.0 Rust was garbage collected. No really I’m not kidding. It seems a very foreign concept these days for Rust given it’s known for the borrow checker and not needing to do any garbage collection. However, it was indeed garbage collected and it used to have Green Threads also known as M:N threading. This is the model a lot of programming languages use for asynchronous code and we’ll dive more into what it means later, but for now consider it a model of async that needs a Garbage Collector (GC). In practice Rust had the ability to do asynchronous code before 1.0, but removed it in the run up to 1.0 in this PR on November 14th, 2014, only 6 months before the release of 1.0. Most of this time period features that were shaky, uneeded, or would require more time to bake were stripped out or put behind feature flags. Some still aren’t stable (looking at you box_syntax), but the full removal of green threading suggests that it was fundamentally incompatible with how Rust operates. The tradeoff of getting async and needing some kind of GC was just not something anyone would want to invest in. The feature was removed and would need some time to let new ideas come to the forefront that would let Rust have it’s cake and eat it too: performant async code without a GC.

2018 is when async really started to get pushed for, with the Futures crate 0.1 release being an example of what it could look like in Rust (with none of the compiler support). RFC 2592 covers the history a bit here and also how futures in Rust would work with the Pin type in RFC 2349 which was subsequently improved. Some key insights were had that would make working with Pin useable as there was some concern they’d have to ship it with an unsafe API. We’ll cover more about Pin later, but it’s important to know this is the key that let’s Rust’s async code work.

Many revisions, burnt out maintainers, and arguments over await syntax later things finally were stabilized and async/await and it’s full form was released on stable in 1.39 on November 7th, 2019 almost 5 years after the original green threading was removed.

Polling for Async Knowledge

I’ve been talking about async code but what does that even mean? Well sometimes you might do something like “Read a file” and while it seems fast to us, for the computer it takes a verrrrry long time to go get the file and read it’s contents into memory. In the meantime your program is kind of just sitting there waiting till it can continue. Wouldn’t it be nice if you could have the CPU do more computing while you were waiting on that file to come back? That’s what async programming is used for and especially in things like networking in the case of the C10k problem. This is known as concurrency which is different from parallelism. Concurrency is when you can execute a problem out of order or at the same time without it affecting the outcome of the computation, for example serving multiple connections to a webserver on a single thread. Parallelism is when you have multiple problems being solved at the same time. You can have concurrency with or without parallelism, (multiple threads serving multiple connections on a webserver) but the concepts are distinct from each other and often conflated. What we care about is that async programming is primarily concurrency. To expand on the server connections example, the webserver can do other computations for a different request, while it awaits a response from the database for another request. It’s doing these requests out of order but they shouldn’t affect each other generally speaking.

With Rust we use Futures to represent the computation, such as the web request. A Future can be polled to see if it’s complete or not. If it is, it will return the value we expect it too, so maybe a number or just the unit type (). It can be pretty much anything. The Future is transformed by the compiler internally into a state machine with a “perfectly sized stack” that represents the entire computation and stores all the state it needs. This was Rust’s big breakthrough in this space in that it had all the information it needed to only allocate as large of a stack that it needs for a computation for efficiency instead of using segmented stacks. It’s worth reading withoutboats article that I linked above.

The Future however is kind of useless on its own. You need some kind of executor to drive the computation. Rust gives you all the tools in the standard library to write your own, but most people will use one like Tokio to drive it instead. It’s not an easy task to write an efficient executor, but it is possible. This executor will poll and drive the computation and run other tasks while certain ones aren’t ready to be continued. I’m not gonna dig too much into things like Wakers, but you can think of them as ways to register with the executor when the task should be woken up and polled again to drive it to completion.

Okay so where does Pin fit into all this? Remember how I said a Future is a state machine that contains all the state it might need to perform the computation? It sometimes might use a reference, but in Rust all types are movable. The compiler can move that data around in memory as much as it needs too. So what happens if you have a reference to something that might move around across await points? That’s right you’re making a reference to garbage data not your actual type. So we need some way to keep the data in one spot so the reference works across await points. Pin guarantees the type won’t move in memory and that the compiler won’t do anything with it. fasterthanlime has a great article on pinning and plumbs the depths a bit here that’s worth your time to read.

It’s worth noting though the major win with async/await is that you do not have to write any of these state machines at all. The compiler does it for you! You get to write async code that looks like sync code!

// A synchronous function
fn read_file(path: &Path, buffer: &mut Vec<u8>) -> Result<(), IoError> {
  let file = File::open(path)?;
  file.read_to_end(buffer)?;
  Ok(())
}

// The async version
async fn read_file(buffer: &mut Vec<u8>) -> Result<(), IoError> {
  let file = File::async_open(path).await?;
  file.async_read_to_end(buffer).await?;
  Ok(())
}

It’s not that much different in how it looks, but the async version is turned into a state machine under the hood and you don’t have to write it out by hand. It’s probably the biggest win, because you theoretically *could* write concurrent code without ever using async/await, but it’s a tedious chore and we have a sufficiently advanced compiler that can do a lof of that for us, we just need to bring an executor.

In summary:

Rust lays out a Future state machine in memory it can come back to and poll as the computation continues
Future computations are driven by an executor
Wakers tell the executor when to poll the Future again

Pin lets us use references inside our async code which is a huge part of how one writes Rust code
async/await’s biggest win for many people is being able to write complicated async code that looks synchronous

This Month’s Interesting Articles

Move Constructors in Rust: Is it Possible? - The ffi story with Rust and C++ could be a lot better and this article covers that possibility. It’s a fascinating article given the difference in the memory models of the language. Was an absolute joy to read and I learned a thing or two. Maybe you will as well!

Rediscovering Hamming code - Error correcting code is just such a cool concept to me. How do you make sure you don’t have gibberish because some cosmic ray did a few bit flips? It’s a neat article with some good resources to check out and the implementation details are in Rust!

This Month’s Interesting Projects

zellij - A terminal multiplexer written in Rust! It’s not at a spot where I can use it as a daily driver over tmux, but I’ll be keeping a close eye on it

neovide - As a neovim user myself (running those 0.5 nightlies so I can get that sweet sweet lua scripting instead of vim script), I was blown away by this neovim client written in Rust that even has WSL support which for me is great given that’s all I code on these days.