Getting file/line in await traces

An addendum to my async debugging post

2023-07-05

I recently posted about my debugger for async Rust, which can generate what I call “await-traces” for async code that’s suspended and not currently running. I mentioned at the time that it appeared possible to get the source code file name and line number corresponding to the await points, but left that for future work.

This is an update describing that future work.

The structure of an async fn type

I picked through this at some length in the original post, but skipped some details that are relevant for this one – so let’s revisit the topic. When we write an async fn in Rust, the compiler generates a data type behind the scenes. What does that type look like?

Let’s take a very simple function, something like:

async fn example(argument: u32) {
    other_operation(argument).await
}

This causes the compiler to generate types that – if we were instead writing them by hand – would look like…

enum example::{async_fn_env#0} {
    0(Unresumed),
    1(Returned),
    2(Panicked),

    3(Suspend0),
}

struct Unresumed { argument: u32 }
struct Returned;
struct Panicked;
struct Suspend0 {
    __awaitee: OtherOperationAsyncFnEnv,
}

We can’t actually write that code, exactly, because it uses names that are illegal in Rust code (like {async_fn_env#0} or an enum variant named 3), but the compiler can do whatever it wants, and these are the names it uses internally. This keeps the names from conflicting with the names in your program.

When you initially call example(something) it produces an example::{async_fn_env#0} that has variant 0 – containing the Unresumed struct. This holds the argument you passed in.

When you first poll it, it will call other_operation(argument), which produces its own future type. This gets stashed into the __awaitee field. (In practice, it gets polled once first to see if it completes immediately, in which case Suspend0 is skipped and we move directly to Returned – but that’s not important here.)

So, when decoding an await trace, we use these __awaitee fields – potentially nested – to follow the frames of the trace.

Mapping to DWARF

The mapping of Rust code to DWARF debug information isn’t spectacularly well documented, but the implementation is quite good. We had to reverse-engineer it when writing Humility, and I’ve done it again to write lildb.

If DWARF were represented as something resembling YAML¹, and a lot of it removed for the sake of brevity, our types above would be represented as something like:

- structure_type:
    name: "{async_fn_env#0}"
    variant_part:
        discriminator: (stuff)
        member:
            name: "__state"
            type: (reference to u8)
            data_member_location: 5
            artificial: true
        variants:
            - variant:
                discriminator_value: 0
                member:
                    name: "0"
                    type: (reference to Unresumed)
            - variant:
                discriminator_value: 1
                member:
                    name: "1"
                    type: (reference to Returned)
            ... more variants
- structure_type:
    name: "Unresumed"
    members:
        - member:
            name: "argument"
            type: (reference to u32)
    ... etc

Yeah, I know, I hate YAML too. It just happened to be convenient here.

The first thing that might surprise you here, if you haven’t worked with Rust DWARF before, is that the Rust enum is a DWARF structure_type! DWARF was designed by C people, and its notion of enum is a C enum – it can’t have fields. Rust takes advantage of existing DWARF support for “variant structs” added to support Ada, decades ago.

The pattern Rust uses to represent an enum as a “variant struct” is:

Each enum variant becomes a variant of the struct.
A top-level discriminator member tells you where to look to determine which variant is happening.
Each variant contains a single member, which is itself a struct. This is true even if the variant only contains one field – it always gets wrapped in a struct.

For a normal Rust enum, the names of the variants and the names of the generated structs that contain the field are the same.

For a generated async_fn_env, the names of the variants are (currently, at least) decimal integers starting at 0, with the SuspendX variants beginning at 3 – while the names of the generated structs are as I described in the previous post: “Unresumed”, “Returned,” etc.

Finding file/line

One of the important roles for any debugging format is mapping parts of the generated machine-readable program back to the input source file(s) that produced them. DWARF identifies places in the source files using what it calls “declaration coordinates,” or “decl coords.”

A declaration coordinate consists of

A source file name,
A line number,
A column number.

Any of those can be missing, because everything is optional in DWARF, just to make the lives of debugger writers more interesting. So, you can have something in a program that is mapped to line 12 of … an unspecified file. DWARF is fine with that. The reasons are lost to the mists of time.

As I noted in my last post on the topic, the big issue with trying to use DWARF to debug async programs is that DWARF has no notion of an async program. Rust is kind of … faking it, by using existing DWARF constructs. (I think we’re going to want DWARF extensions for this in the long run.)

So – how do we get the decl coordinates for an entity that DWARF doesn’t know exists, like an await point?

Through reverse engineering, it turns out that rustc (at least as of 1.69) records the decl coordinates for await points! It chose to put them… not on the enum variant… not on the generated struct type… but on the variant member. That is, they are here:

- structure_type:
    name: "{async_fn_env#0}"
    variant_part:
        discriminator: (stuff)
        member:
            name: "__state"
            type: (reference to u8)
            data_member_location: 5
            artificial: true
        variants:
            - variant:
                discriminator_value: 0
                member:
                    name: "0"
                    type: (reference to Unresumed)
                    decl_file: omg/lol.rs   # <------
                    decl_line: 1234         # <------
            - variant:
                discriminator_value: 1
                member:
                    name: "1"
                    type: (reference to Returned)
                    decl_file: omg/lol.rs   # <------
                    decl_line: 1500         # <------
            ... more variants

I noticed this when I was reading a raw DWARF dump of a Rust program. (If you write a debugger, you’ll spend a lot of time reading raw DWARF dumps.) This particularly jumped out at me because Rust is, in general, really bad about including decl coords on structures, enums, variants, and the like. These are one of the few cases where you can find the line where an enum variant is declared. Fortunately, this is the one case I needed!

debugdb is my crate that lildb is built on. It’s responsible for parsing DWARF and turning it into an in-memory database format suitable for program analysis. debugdb only represents the portions of DWARF that I’ve opted in, and I hadn’t included decl coord parsing for members of structs/enums because Rust generally doesn’t include it. So, after adding that, and a chunk of somewhat gross code to the debugger, we can have our await traces with a side of line numbers:

>> tasks
current tick-time is: 1620
task 0:
async fn keybad_fw::serial::task
    suspended at src/serial.rs:35    # <--- LINE NUMBERS! YAY!
waiting on: async fn lilos::spsc::{impl#4}::pop<keybad_fw::scanner::KeyEvent>
    suspended at /home/cbiffle/proj/lilos/os/src/spsc.rs:313
    waiting for data in spsc queue at 0x20001d7c
    queue type: lilos::spsc::Queue<keybad_fw::scanner::KeyEvent> <.debug_info+0x0000e5b4>
task 1:
async fn keybad_fw::scanner::task
    suspended at src/scanner.rs:134
waiting on: async fn lilos::exec::{impl#6}::next_time
    suspended at /home/cbiffle/proj/lilos/os/src/exec.rs:965
waiting on: async fn lilos::exec::sleep_until
    suspended at /home/cbiffle/proj/lilos/os/src/exec.rs:815
    sleeping until: 1621 (1 ms from now)

This makes the await traces a lot more useful for debugging. Previously, lildb would just tell you that the task was suspended at “await point 3” in the function, leaving you to guess which one the compiler chose to be number 3. Now, you can direct your editor to the exact spot. (If you have more than one await on a line, you’ll still need to do some guessing – rustc doesn’t generate column numbers for await points. Usually the rest of the trace should help you infer the answer from context.)

Another step toward our glorious debuggable async future!

#async #debugging #dwarf #lilos #rust

Cliffle

The structure of an async fn type

Mapping to DWARF

Finding file/line