How to think about `async`/`await` in Rust

2023-06-30

(This is a section of the lilos intro guide that people seemed to like, so to increase its visibility, I’m lifting it up into its own post and expanding it a bit. I hope this is a useful companion piece to the post on async debugging I posted this morning.))

Some documentation of Rust async and await has presented it as a seamless alternative to threads. Just sprinkle these keywords through your code and get concurrency that scales better! I think this is very misleading. An async fn is a different thing from a normal Rust fn, and you need to think about different things to write correct code in each case.

This post presents a different way of looking at async that I think is more useful, and less likely to lead to cancellation-related bugs.

`async fn` is an inversion of control

Here is how I think about fn vs async fn:

A Rust fn is a function that will execute until it decides to stop executing (ignoring things like threads being preempted), or until it’s interrupted by a panic. In particular, its caller gives up control by calling it, and cannot decide to “un-call” it halfway through. (And likewise, if your fn calls another fn, you give up control to that fn, which can decide to enter an infinite loop or panic!.)
A Rust async fn is an explicit state machine that you can manipulate and pass around, that happens to be phrased using normal Rust syntax instead of tables and match statements. It generates a hidden type implementing the Future trait. The code that calls an async fn (or uses any Future, for that matter) has ultimate control over that Future, and can decide when it runs or doesn’t run, and can even discard it before it completes.

This distinction is subtle but very important: an async fn represents an inversion of control compared to a normal fn.

You’ve probably run into inversion of control as a pattern before – it’s often used in things that get referred to as “frameworks.” Have you written a request handler that gets initialized and invoked by a webserver when appropriate to handle events? Inversion of control. Done almost anything in React? Same deal. The important part for our purposes is that, where normal code gets control of the computer from its caller, here the caller gets control of the code instead.

To illustrate the difference, let’s talk about state machines.

Hand-rolling an explicit state machine

If you wrote an explicit state machine by hand, this distinction would be clear in the code. For instance, here’s a simple one:

#[derive(Default)]
enum State {
    #[default]
    Begin,
    PinHigh,
    PinLow,
    Done,
}

impl State {
    /// Returns `true` if it completes, `false` otherwise.
    fn step(&mut self) -> bool {
        match self {
            Self::Begin => {
                set_pin_high();
                *self = Self::PinHigh;
                false
            }
            Self::PinHigh => {
                set_pin_low();
                *self = Self::PinLow;
                false
            }
            Self::PinLow => {
                tristate_pin();
                *self = Self::Done;
                false
            }
            // Our terminal state:
            Self::Done => true,
        }
    }
}

State machines like this are almost universal in embedded systems, whether they’re phrased explicitly or left implicit. Drivers that have a combination of API entry points and interrupt service routines, for instance, form this kind of state machine. This toy version is written to be small enough to pick apart.

Each time the code that owns your State calls step, your code gets the opportunity to do stuff. At the end of that stuff, it returns, and the calling code regains control. It can then keep calling step until it gets true, indicating completion; or it could do something else and never call step again; or it could drop your state. (Note that it can also choose to keep calling step even after getting the true result! It’s very much in control here.)

How long will the high and low periods on the pin last? Well, how often will the caller call step? Sometimes this is defined by a contract (e.g. “this state machine advances every 100 ms”), but in this code example, we haven’t done anything to control timing. The caller could call step in a loop and make the high/low periods as short as possible, or it could sleep for months in between calls…or never call step again.

What will the final state of the pin we’re controlling be? Currently, we can’t say. The caller could leave us paused forever without calling step, or could drop us before we finish. So the final state of the pin could be high, low, or tristate, depending on what the caller chooses. We could make this better-defined by adding a Drop impl, so if the caller were to drop the State before it finishes, the pin would do someting predictable:

impl Drop for State {
    fn drop(&mut self) {
        if !matches(self, Self::Done) {
            tristate_pin();
            *self = Self::Done;
        }
    }
}

But if your caller decides to hang on to State and never call step, there’s not really anything State itself can do about this.

And you want it this way. Really. Keep reading.

Explicit state machines mean your caller has control

That might sound bad, but it’s really powerful. For instance, imagine that your caller looks like this:

let mut state = State::default();

loop {
    wait_for_a_key_press();
    let done = state.step();
    if done { break; }
}

If we want to step every time the user presses a key, then we have to accept the possibility of never step-ping – because we can’t force the user to press a key! Being able to create a state machine and have it sit around waiting forever, at very low cost, is part of the power of writing explicit state machines.

Writing state machines with `async fn`

Writing explicit state machines in “long-hand” like this is error-prone and complex. Let’s rewrite the running example as an async fn. (The pending! macro is from the futures crate, and yields to the caller without waiting for any particular event. It contains an await.)

async fn my_state_machine() {
    set_pin_high();
    pending!();

    set_pin_low();
    pending!();

    tristate_pin();
}

That doesn’t reproduce the Drop behavior if we’re cancelled. To do this in an async fn you need to have something in the body of the function that will perform an action when destroyed. You can roll this by hand, but, I recommend the scopeguard crate and its defer! macro:

async fn my_state_machine() {
    set_pin_high();

    // Now that we've set the pin, make sure
    // it goes tristate again whether we exit
    // normally or by cancellation.
    defer! { tristate_pin(); }
    pending!();

    set_pin_low();
    pending!();

    // Pin gets tristated here
}

That’s dramatically less code. It’s also much easier to check for correctness:

You can tell at a glance that there’s no way to return to an earlier state from a later one, since doing so would require a for, loop, or while, and there isn’t one here.
You can see (once you’ve read the docs for the defer! macro) that, as soon as the pin gets set high and before we yield control back, the state machine will ensure that the pin gets tristated at the end, no-matter-what. You don’t have to go hunting for a separate Drop impl.

`await` is a composition operator

Often, an application winds up requiring a hierarchy of state machines. Imagine that you wanted to take the pin-toggling state machine from the previous section, and ensure that it waits a certain minimum interval between changes. If the OS provides a “sleep for a certain time period” state machine (as lilos does) then the easiest way is to plug that into your state machine. Its states effectively become sub-states within one of your states. This is composition.

In a hand-rolled state machine, this is hard enough to get right that I’m not going to present a worked example. (Try it if you’re curious!)

But with a state machine expressed using async fn, it’s trivial, because we have an operator for it: await. await is the most common state machine composition operator (though not the only one!). It says, “take this other state machine, and run it to completion as part of my state machine.”

And so, we can add sleeps to our pin-toggler by changing our pending!() to instead await a reusable sleep-for-a-duration state machine:

async fn my_state_machine() {
    set_pin_high();
    defer! { tristate_pin(); }

    sleep_for(Millis(100)).await;

    set_pin_low();
    sleep_for(Millis(100)).await;

    // Pin gets tristated here
}

This will ensure that a minimum of 100 ms elapses between our changes to the pin. We can’t impose a maximum using this approach, because – as we saw above – our caller could wait months between stepping our state machine, and that’s part of what we’re signing up for by writing this state machine.

Composition and cancellation interact in wonderful ways. Let’s say you’re using some_state_machine and you’re suspicious that it might take more than 200 ms. You’d like to impose a timeout on it: it will have 200 ms to make progress, but if it doesn’t complete by the end of that window, it will be cancelled (drop-ped).

The easiest way to do this is to use the select_biased! macro from the futures crate. (It’s called biased because it steps the state machines inside it from first to last, and if any complete, all the rest are dropped. This means it’s slightly biased toward completing the earlier ones.)

select_biased! {
    _ = sleep_for(Millis(200)) => {
        // The timeout triggered first! Do any additional
        // cleanup you require here.
    }
    result = some_state_machine() => {
        // The state machine completed successfully!
        print(result);
    }
}

This is the sort of power we get from the async fn ecosystem. Doing this with hand-rolled state machines is probably possible, but would be complex – and we haven’t even talked about borrowing and lifetimes. That’s a bigger topic than will fit in this post, but the short version is: borrowing across await points in an async fn pretty much Just Does What You’d Expect, but getting it right in a hand-rolled state machine requires unsafe and gymnastics.

Summary

From my perspective, this is the fundamental promise of async fn: easier, composable, explicit state machines.

If a chunk of code absolutely needs to run to completion without letting anything else run, use a normal fn. If a chunk of code doesn’t need to call any async fns, use a normal fn. Basically, any function that can be written as a normal fn without breaking something, should be. It’s easier.

But if you need to write a state machine, use async fn. It’s harder to understand than normal fn because of the inversion of control and potential for cancellation, but far easier to understand than the code you might write by hand to do the same thing!

CAUTION: There’s a proposal to make code generic on whether or not it’s being used async, so that the same code could produce both a simple function and a Future. In this case you’d have to make sure to think about correctness in all possible ways your code could be used. I am suspicious, and I hope after reading this section, you are too.

#design-patterns #rust

Cliffle

async fn is an inversion of control