Writing a basic `async` debugger

2023-06-30

I’m a big fan of Rust’s async feature, which lets you write explicit state machines like straight-line code. One of the operating systems I maintain, lilos, is almost entirely based on async, and I think it’s a killer feature for embedded development.

async is also popular when writing webservers and other network services. My colleagues at Oxide use it quite a bit. Watching them work has underscored one of the current issues with async, however: the debugging story is not great. In particular, answering the question “why isn’t my program currently doing anything” is very hard.

I’ve been quietly tinkering on some tools to improve the situation since 2021, and I’ve recently released a prototype debugger for lilos: lildb. lildb can print await traces for uninstrumented lilos programs, which are like stack traces, but for suspended futures. I wrote this to help me debug my own programs, but I’m publishing it to try and move the discussion on async debugging forward. To that end, this post will walk through what it does, how it derives the information it uses, and areas where we could improve things.

`lildb` example

The lildb repo includes an ELF file and memory image from one of my embedded applications (which is available for purchase, though that’ll be the subject of a later post). This ensures you can try it out without having an appropriate microcontroller board handy. You can start the debugger by running (from inside the repo):

cargo run --bin lildb demo/keybad-fw

which will present you with a prompt:

Loaded; 1601 types found in program.
To quit: ^D or exit
>>

I find the tool quite useful for offline program analysis even without loading the memory image of a running program; for instance, without further ado, you can use it to visualize Rust’s “niche optimization” for enums:

>> info core::option::Option<&mut [u8; 8]>
core::option::Option<&mut [u8; 8]> <.debug_info+0x000012d4>: enum type
- byte size: 4
- alignment: 4
- 2 variants discriminated by u32 at offset 0
- when discriminator == 0
  - contains type: core::option::Option<&mut [u8; 8]>::None <.debug_info+0x00001303>
  - at offset: 0 bytes
  - aligned: 4 bytes
- any other discriminator value
  - contains type: core::option::Option<&mut [u8; 8]>::Some <.debug_info+0x00001314>
  - at offset: 0 bytes
  - aligned: 4 bytes

Discriminator position:
         0      1      2      3   
     +------+------+------+------+
0000 | DISC                      |
     +------+------+------+------+

>> info core::option::Option<&mut [u8; 8]>::Some
core::option::Option<&mut [u8; 8]>::Some <.debug_info+0x00001314>: struct type (tuple-like)
- byte size: 4
- alignment: 4
- template type parameters:
  - T = &mut [u8; 8] <.debug_info+0x00013fe8>
- members:
  0. __0: &mut [u8; 8] <.debug_info+0x00013fe8>
    - offset: 0 bytes
    - size: 4 bytes
    - aligned: 4 bytes
         0      1      2      3   
     +------+------+------+------+
0000 |  0                        |
     +------+------+------+------+
     where:
       0 = __0: &mut [u8; 8] <.debug_info+0x00013fe8>

There we can see that Option<&mut T> has the same size as &mut T, using the value 0 – which is illegal in a Rust reference – to indicate None. Handy.

But for debugging async tasks, we need to overlay the memory image of the actual running program:

load demo/memory-image.bin 0x20000000

…which gives us access to more state inspection commands, including normal debugger things like printing info about a static variable:

>> var lilos::time::TICK
lilos::time::TICK @ <.debug_info+0x000163b5>
- type: core::sync::atomic::AtomicU32 <.debug_info+0x00016e57>
- address: 0x20000624
- current contents: use core::sync::atomic::AtomicU32 as AtomicU32;
AtomicU32 {
    v: core::cell::UnsafeCell<u32> {
        value: 1620_u32,
    },
}

…but the point of this article is the async-specific features, such as getting an await-trace for all tasks:

>> tasks
current tick-time is: 1620
task 0:
async fn keybad_fw::serial::task
    suspended at await point 1
waiting on: async fn lilos::spsc::{impl#4}::pop<keybad_fw::scanner::KeyEvent>
    waiting for data in spsc queue at 0x20001d7c
    queue type: lilos::spsc::Queue<keybad_fw::scanner::KeyEvent> <.debug_info+0x0000e5b4>
task 1:
async fn keybad_fw::scanner::task
    suspended at await point 0
waiting on: async fn lilos::exec::{impl#6}::next_time
    suspended at await point 0
waiting on: async fn lilos::exec::sleep_until
    sleeping until: 1621 (1 ms from now)

Alright, with the demo out of the way, let’s talk about what just happened!

What’s an await-trace?

An await-trace is similar to a stack-trace, but different. It follows the structure of async code that is parked at an await point, by decomposing what, exactly, it’s waiting for.

Consider the second await trace from the example above:

async fn keybad_fw::scanner::task
    suspended at await point 0
waiting on: async fn lilos::exec::{impl#6}::next_time
    suspended at await point 0
waiting on: async fn lilos::exec::sleep_until
    sleeping until: 1621 (1 ms from now)

This means:

The outermost code of the task is an async fn called keybad_fw::scanner::task.
That async fn contains at least one await in its code.
It is parked at the first one (await point 0).
At that point in the code, it is awaiting another async fn, lilos::exec::{something}::next_time. (I’ll come back to the {something} part later.)
That async fn is awaiting another async fn, lilos::exec::sleep_until…
…which is recognized by the debugger as meaning a time-based sleep (a trick I’ll discuss later).

In general an await-trace can be mapped onto a conventional stack trace, but will contain a subset of the stack trace’s information: only the points where one function awaits another. Any intermediate functions that don’t await won’t show up.

This is sometimes less useful than a conventional stack trace, but it’s still quite useful. In the example above, we can finally answer the question “why is my future or async fn not running right now” – it’s because it’s sleeping, and will continue to sleep for another millisecond.

Why can’t I have a stack trace?

In short: because the code shown above isn’t actually running. Let me break that down:

A Rust future is just a Rust value that implements the Future trait. You can pass a future to a function, return it from a function, put it in a Box on the heap, etc.

Because it’s just a value, the future doesn’t have control of the CPU, and in fact doesn’t “know” when it should execute. This is the responsibility of the code handling the future, which must grant it CPU time for it to make progress by periodically calling poll. (In lilos the ultimate responsibility for doing this bubbles up to the executor.)

While the future’s poll implementation is running, it is effectively borrowing the CPU from the caller (just like any function does while it’s being called), and it also borrows the caller’s stack (again, like any function).

When poll is not running, the future is sitting there as a data structure, in a state I call suspended. While a future is suspended, it does not have a stack. It does not have a stack pointer even, or a program counter / instruction pointer. Anything it might need in a future execution of poll (such as the values of local variables in an async fn) is stored inside the data structure in memory, not on the stack or in registers. Some of the stored information indicates where to resume when the future is next polled – in the case of an async fn, it’s stored in a compact representation that is most certainly not a program counter value.

This means the traditional stack trace algorithms and debug info break down. If you don’t have a stack, and can’t provide an address in the program where we’re running, you can’t get a conventional stack trace.

This has been the core difficulty in debugging async Rust programs for, well, ever.

Aside: intrusive tracing

I’m not the only person working on this problem! The Tokio ecosystem has produced a number of tools for debugging async programs, but as far as I can tell they’re all intrusive. This means you have to annotate the program being debugged to record information about what function or unit of work is running and who’s awaiting what. In the case of tokio-console, the program also has to be running and available to answer questions about its state.

That sort of intrusive trace can be super valuable – I got a lot of use out of the Dapper system at Google, which is very similar – but it doesn’t replace the ability to debug uninstrumented code. Particularly in an embedded context, spending the additional flash space and CPU time on maintaining a parallel data structure describing the program just won’t work.

But even in what I call a “Big Computer,” where you have effectively infinite memory and CPU, intrusive tracing can only trace the things you remembered to annotate – and in my experience you will eventually hit a bug in a thing you didn’t annotate. If the bug is difficult to reproduce, you may only have a core dump, and you may have to diagnose the bug without being able to make further changes to the software.

So, I think we need tools for debugging uninstrumented programs, in addition to tracing facilities.

Alright, back to the debugger!

How do we find the futures in memory?

In order to present an await trace for a future, we need to know where it is. This means peering into the code responsible for polling the future – the async runtime or executor. The runtime may be cooperative or uncooperative.

In a cooperative runtime, there’s some sort of explicit metadata exposed for debuggers, to help them locate tasks (which are the top-level futures managed by the runtime). This is ideal, but requires support from the runtime.

In an uncooperative runtime, we have to use the debug information compiled into the program to take the runtime apart and look for internal private variables. We can do this, because encapsulation doesn’t stop debuggers – but it’s more fragile, because programs generally assume they’re free to change their own private state without warning anyone. (This is the eternal struggle of debugger authors.)

lilos is cooperative: it exposes a static variable called lilos::exec::TASK_FUTURES, explicitly for consumption by debuggers. We can look at that variable in the snapshot:

>> var lilos::exec::TASK_FUTURES
lilos::exec::TASK_FUTURES @ <.debug_info+0x00015c32>
- type: core::option::Option<*mut [core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>]> <.debug_info+0x000171b2>
- address: 0x20000600
- current contents: core::option::Option<*mut [core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>]>::Some(*mut [core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>] {
    data_ptr: 0x20001fd4 as *_ core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>,
    length: 2_u32,
})

…and see that it has the type (deep breath!) Option<*mut [Pin<*mut dyn Future<Output = Infallible>>]>.

Whew.

So in lildb we use this static to guide us to the task futures.

What do they look like on the inside?

Let’s pick an async fn to take apart, shall we? (We are skipping a very important detail here, which you might notice if you’ve done a lot of program analysis – fear not, I’ll cover it in the next section.)

Warning: From this point forward, this discussion is specific to certain versions of the Rust toolchain. I’ve mostly done this work on 1.69 and later. Some earlier versions have less metadata available; with any luck later versions will have more.

Let’s pick lilos::exec::sleep_until, the innermost awaited frame on one of our tasks’ await traces. We can look for related types using the list command in lildb – I have trimmed the top part of the output here because it is irrelevant.

>>> list lilos::exec::sleep_until
...output trimmed...
enum   lilos::exec::sleep_until::{async_fn_env#0} <.debug_info+0x0000c842>
struct lilos::exec::sleep_until::{async_fn_env#0}::Panicked <.debug_info+0x0000c8c4>
struct lilos::exec::sleep_until::{async_fn_env#0}::Returned <.debug_info+0x0000c8b1>
struct lilos::exec::sleep_until::{async_fn_env#0}::Suspend0 <.debug_info+0x0000c8d7>
struct lilos::exec::sleep_until::{async_fn_env#0}::Unresumed <.debug_info+0x0000c89e>

This shape is typical for an async fn in the debug info. An async fn produces several types:

A Rust enum (i.e. an enum that can have fields, as opposed to a C-style enum) representing the future. This is the first type in the list. Current compilers name this name_of_the_async_fn::{async_fn_env#0}.
A set of Rust struct types that describe the information recorded in the future at each possible suspended state.

What suspended states are possible? A future can suspend at any await point, so we have one suspended state for each await point in the code. The current compiler calls these states SuspendX, where X is an integer that numbers the await points, starting from 0.

But in addition to those states, are three other states you might not have thought about:

Unresumed is the state of an async fn future that has been created (by calling the async fn) but has not been polled (by await or select! or various other means). Polling the future in this state will kick it forward into one of the other states (typically a SuspendX state).
Returned is the state of an async fn that has resolved, that is, has been polled and returned Poll::Ready(result) from the poll function. Polling the future in this state will panic.
Panicked is the state of an async fn that has panicked on a previous poll. Polling the future in this state will panic some more.

(Aside: if you’ve noticed that each async fn adds more text size to your program than you’d like, this is why: every async fn contains at least one panic site, and panic sites are expensive, because people on Big Computers like to include very wordy string messages in each panic that I can’t turn off.)

So in the list above, we see that there are four states for sleep_until: a single await point, plus the three special states.

If we inspect the enum, we can see that it serves to select between the various state structs in its variants:

>>> info lilos::exec::sleep_until::{async_fn_env#0}
lilos::exec::sleep_until::{async_fn_env#0} <.debug_info+0x0000c842>: enum type
- byte size: 56
- alignment: 8
- 4 variants discriminated by u8 at offset 52
- when discriminator == 0
  - contains type: lilos::exec::sleep_until::{async_fn_env#0}::Unresumed <.debug_info+0x0000c89e>
  - at offset: 0 bytes
  - aligned: 8 bytes
- when discriminator == 1
  - contains type: lilos::exec::sleep_until::{async_fn_env#0}::Returned <.debug_info+0x0000c8b1>
  - at offset: 0 bytes
  - aligned: 8 bytes
- when discriminator == 2
  - contains type: lilos::exec::sleep_until::{async_fn_env#0}::Panicked <.debug_info+0x0000c8c4>
  - at offset: 0 bytes
  - aligned: 8 bytes
- when discriminator == 3
  - contains type: lilos::exec::sleep_until::{async_fn_env#0}::Suspend0 <.debug_info+0x0000c8d7>
  - at offset: 0 bytes
  - aligned: 8 bytes

Discriminator position:
         0      1      2      3   
     +------+------+------+------+
0000 | body                      |
     +      +      +      +      +
0004 |                           |
     +      +      +      +      +
        ... padding trimmed for blog post ...
     +      +      +      +      +
0030 |                           |
     +------+      +      +      +
0034 | DISC | body               |
     +------+------+------+------+

In other words, there’s a single byte at offset 0x34 into the enum that tells us what state it’s in, and each possible state changes the interpretation of all the other bytes in the enum. For instance, here’s the Unresumed state:

>> info lilos::exec::sleep_until::{async_fn_env#0}::Unresumed
lilos::exec::sleep_until::{async_fn_env#0}::Unresumed <.debug_info+0x0000c89e>: struct type
- byte size: 56
- alignment: 8
- members:
  0. deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
    - offset: 0 bytes
    - size: 8 bytes
    - aligned: 8 bytes
         0      1      2      3   
     +------+------+------+------+
0000 |  0                        |
     +      +      +      +      +
0004 |                           |
     +------+------+------+------+
0008 | pad                       |
     +      +      +      +      +
        ... padding trimmed for blog post ...
     +      +      +      +      +
0034 |                           |
     +------+------+------+------+
     where:
       0 = deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>

In the Unresumed state we only have a single field holding the argument, deadline.

(Notice that the Unresumed struct puts the enum’s discriminator field in unused pad space. This is typical of how Rust enums get represented in debug info. Think of it as being overlaid on the enum definition above.)

Whereas if we inspect the future at the first await point, Suspend0, things get much fancier:

>> info lilos::exec::sleep_until::{async_fn_env#0}::Suspend0
lilos::exec::sleep_until::{async_fn_env#0}::Suspend0 <.debug_info+0x0000c8d7>: struct type
- byte size: 56
- alignment: 8
- members:
  0. deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
    - offset: 8 bytes
    - size: 8 bytes
    - aligned: 8 bytes
  1. node: core::pin::Pin<&mut lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001c74>
    - offset: 48 bytes
    - size: 4 bytes
    - aligned: 4 bytes
  2. __2: lilos::list::Node<lilos::time::TickTime> <.debug_info+0x0000e737>
    - offset: 16 bytes
    - size: 24 bytes
    - aligned: 8 bytes
  3. __awaitee: lilos::list::WaitForDetach<lilos::time::TickTime, lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x0000e7a4>
    - offset: 40 bytes
    - size: 8 bytes
    - aligned: 4 bytes
  4. deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
    - offset: 8 bytes
    - size: 8 bytes
    - aligned: 8 bytes
         0      1      2      3   
     +------+------+------+------+
0000 | pad                       |
     +      +      +      +      +
0004 |                           |
     +------+------+------+------+
0008 |  4                        |
     +      +      +      +      +
000c |                           |
     +------+------+------+------+
0010 |  2                        |
     +      +      +      +      +
0014 |                           |
     +      +      +      +      +
0018 |                           |
     +      +      +      +      +
001c |                           |
     +      +      +      +      +
0020 |                           |
     +      +      +      +      +
0024 |                           |
     +------+------+------+------+
0028 |  3                        |
     +      +      +      +      +
002c |                           |
     +------+------+------+------+
0030 |  1                        |
     +------+------+------+------+
0034 | pad                       |
     +------+------+------+------+
     where:
       0 = deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
       1 = node: core::pin::Pin<&mut lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001c74>
       2 = __2: lilos::list::Node<lilos::time::TickTime> <.debug_info+0x0000e737>
       3 = __awaitee: lilos::list::WaitForDetach<lilos::time::TickTime, lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x0000e7a4>
       4 = deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>

At the first await point, the bytes formerly used for the deadline at offset 0 have become unused, and instead we have two overlaid copies of the deadline at offset 8. (This might seem like a problem, but it’s how the compiler describes variable shadowing in futures. It works in the end.)

Importantly, we have two fields with weird underscore names that didn’t appear in the source code: __2 and __awaitee.

__2 is an anonymous local value. In a normal fn this would temporarily appear in the stack or registers and be consumed later in the function. But since we’re an async fn, and we need it to survive across the await point, it’s granted a name by the compiler and stored in a field in the struct.

__awaitee is the name given by current compilers to the future that is being await-ed. This is what enables an await trace, fundamentally: to go into the next thing being await-ed, we need to recursively inspect __awaitee fields until we hit something we don’t understand.

In this case the type of __awaitee is lilos::list::...... … very long. lildb lets you use the DWARF debuginfo offset as a shorthand, it’s printed after every type. Let’s use that to query the shape of our __awaitee:

>> info <.debug_info+0x0000e7a4>
lilos::list::WaitForDetach<lilos::time::TickTime, lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x0000e7a4>: struct type
- byte size: 8
- alignment: 4
- template type parameters:
  - T = lilos::time::TickTime <.debug_info+0x0000e0dd>
  - F = lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime> <.debug_info+0x0000e7e9>
- members:
  0. node: core::pin::Pin<&lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001cb7>
    - offset: 0 bytes
    - size: 4 bytes
    - aligned: 4 bytes
  1. polled_since_detach: core::cell::Cell<bool> <.debug_info+0x000005cf>
    - offset: 4 bytes
    - size: 1 bytes
    - aligned: 1 bytes
  2. cleanup: core::option::Option<lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x00000960>
    - offset: 5 bytes
    - size: 1 bytes
    - aligned: 1 bytes
         0      1      2      3   
     +------+------+------+------+
0000 |  0                        |
     +------+------+------+------+
0004 |  1   |  2   | pad         |
     +------+------+------+------+
     where:
       0 = node: core::pin::Pin<&lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001cb7>
       1 = polled_since_detach: core::cell::Cell<bool> <.debug_info+0x000005cf>
       2 = cleanup: core::option::Option<lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x00000960>

This isn’t an enum – it’s just a normal struct with human-readable names. That’s because the thing we’re awaiting here is not an async fn. It is instead a struct that someone (me) has implemented Future on.

This means that, for the simple “follow the __awaitee” async trace method I summarized above, we have reached the end of the line. (There are tricks for pushing past this, which I’ll discuss below.)

Wait… but you don’t know the types!

We started looking at lilos::exec::sleep_until above because it was named in the await trace. To follow the await trace, we need to know the concrete types involved, so we can query their debug info like we did above.

But if you recall the type of lilos::exec::TASK_FUTURES, it turns out we’ve used some circular reasoning: we looked into that type because it was mentioned in the await trace, which is how we printed the await trace.

But how do we actually figure out what type the future is from TASK_FUTURES, given that it stores *mut dyn Future<...> – an erased type? If we didn’t know anything about what the tasks were doing, how could we get started building an await trace if the concrete type isn’t available?

The answer: there’s a trick. (I’ll be saying this a lot for the rest of the article. Debuggers are mostly made out of tricks.)

Earlier when we ran var lilos::exec::TASK_FUTURES, lildb indicated that the contents were (slightly cleaned up for print):

Option::Some(*mut [Pin<*mut dyn Future<Output=Infallible>>] {
    data_ptr: 0x20001fd4 as *_ Pin<*mut dyn Future<Output=Infallible>>,
    length: 2_u32,
})

This is an Option of a slice. The debug info currently generated by the Rust toolchain for a slice calls the pointer part of the slice data_ptr, and the length part length. So, here, we have a slice of pinned pointers to futures starting at address 0x2000_1fd4 and continuing for 2 elements.

We can paste the pointer value – including the as blah blah blah part – into lildb’s decode command to look at the first pointer in the slice (output slightly trimmed for this post):

>> decode 0x20001fd4 as *_ core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>
Pin<*mut dyn Future<Output=Infallible>> <.debug_info+0x00001fef>: 
Pin<*mut dyn Future<Output=Infallible>> {
    pointer: 0x20001da0 as &mut keybad_fw::serial::task::{async_fn_env#0} as &mut dyn Future,
}

So, the pinned pointer is a struct containing a single field called pointer, which contains the actual pointer. (That matches the definition of Pin in the standard library, if you read it.)

The actual pointer points to address 0x2000_1da0 and lildb says it’s … a specific concrete type? Cast to a &dyn? Rust values don’t have any kind of header indicating their type, so, the debugger has clearly used that trick I warned you about.

Here’s the trick. *mut dyn SomeTrait is a fat pointer in Rust, which consists of a conventional pointer plus one pointer’s worth of metadata. In the case of dyn, that metadata points to a vtable structure in the program. decode hides these details, but we can see them if we use info to follow the types:

>> info <.debug_info+0x0001414a>
*mut dyn core::future::future::Future<Output=core::convert::Infallible> <.debug_info+0x0001414a>: struct type
- byte size: 8
- alignment: 4
- members:
  0. pointer: *_ dyn core::future::future::Future<Output=core::convert::Infallible> <.debug_info+0x00017c06>
    - offset: 0 bytes
    - size: 4 bytes
    - aligned: 4 bytes
  1. vtable: &[usize; 3] <.debug_info+0x00017c16>
    - offset: 4 bytes
    - size: 4 bytes
    - aligned: 4 bytes
         0      1      2      3   
     +------+------+------+------+
0000 |  0                        |
     +------+------+------+------+
0004 |  1                        |
     +------+------+------+------+
     where:
       0 = pointer: *_ dyn core::future::future::Future<Output=core::convert::Infallible> <.debug_info+0x00017c06>
       1 = vtable: &[usize; 3] <.debug_info+0x00017c16>

Unfortunately, the Rust compiler currently doesn’t generate DWARF to describe what type the vtable belongs to. But it does name the vtable consistently. We can discover this if we decode the address of our fat pointer as a usize.

>>> decode 0x20001fd4 as *mut usize
usize <.debug_info+0x00000081>: 
536878496_u32
>> decode 0x20001fd8 as *mut usize
usize <.debug_info+0x00000081>: 
134233152_u32

Pointer value is 536878496 decimal, which is what we had above, and the metadata word is 134233152 decimal. We can look up information about that metadata address with the addr command.

>>> addr 134233152
Offset +0x0 into static <keybad_fw::serial::task::{async_fn_env#0} as core::future::future::Future>::{vtable}
- range 0x8003c40..0x8003c50
- type <keybad_fw::serial::task::{async_fn_env#0} as core::future::future::Future>::{vtable_type} <.debug_info+0x0001296e>
  - .drop_in_place +0x0 (in <keybad_fw::serial::task::{async_fn_env#0} as core::future::future::Future>::{vtable_type})

So the metadata pointer points to a static value of a type whose name is structured as <name-of-our-async-fn-env as Future>::{vtable}.

lildb recognizes this pattern and decodes the name to find the concrete type (left hand side of the as).

This is potentially fragile, but without more information in the debuginfo, this is what we’ve got. And it works – quite well.

So, with this vtable lookup trick, we can de-erase the type of any dyn fat pointer, a trick that lildb uses to handle type-erased futures.

How does `lildb` know to say things like `sleeping until`?

More tricks!

In any async runtime, an await trace will eventually come down to some “primitive” operation. These operations are often things like sleeping, or waiting to pop an item from a queue.

If you have intimate knowledge of the runtime, you can pattern-match these and provide more detailed human-readable descriptions, rather than printing a long cryptic generated type name. This is what lildb does: it recognizes the lilos::exec::sleep_until function and applies a special heuristic for decoding it.

Like other debugger tricks, this is potentially fragile. If lildb finds a sleep_until future that doesn’t match the patterns it expects (like if lilos were to change sleep_until in a very fundamental way), it will fall back to a generic printing routine. That’s the best we can do without updating our heuristics.

Where is all this type information coming from?

On platforms other than Windows, Rust generates debug info in a format called DWARF. There’s quite a lot of it – to the point that the Rust team keeps working on making it smaller. There’s a wealth of information contained in the DWARF data.

It’s also a giant pain to do anything with. The relationships between its entities aren’t well documented (or when they are documented, they’re violated by real compilers), there are important pieces of logic left totally implicit (like how to unify equivalent types across compilaton units), and almost everything is optional, which complicates the data structures and limits the things we can rely on.

I wrote a DWARF parser and analyzer for lildb, because I couldn’t find an off-the-shelf one that met my needs. (At my day job we’d previously written one for Humility, the Hubris debugger; lildb’s parser takes a somewhat different approach.) It’s available as a crate if you’d like to do something weird.

There’s a bunch of information in the DWARF that lildb is not currently exposing, because I haven’t plumbed it all the way through. In particular, there is enough information in the DWARF to print source file name and line number for each await point, instead of just saying “await point 0.” (It’s in a weird place, but, it is there.) I hope to expose this soon.

What’s next?

I’ll continue tinkering on lildb to meet my embedded needs in my personal projects. I think it’d be interesting to try to extend these techniques to Tokio, the async runtime we use at work; this will be harder, because Tokio keeps its task futures in complex dynamic data structures, rather than a simple slice like lilos. But, the techniques should generalize.

I’m collecting a set of concrete requests for Rust DWARF info. Several of the heuristics lildb applies could be removed if rustc generated more information – using existing DWARF attributes. Like linking vtables to their owning type. Or representing traits at all. (Traits are basically missing from the current DWARF.)

But fundamentally, DWARF was not designed for this. The concept of taking a function and magically converting it into a co-routine style state machine wasn’t really a thing when DWARF was designed, and its metadata – line number tables, variable location information, and the like – is very specialized around the idea of a program running on a stack and registers. This makes it difficult to e.g. add information to the program that maps the stack of a currently polled future back to its in-memory suspended representation, at least using standard DWARF attributes.

To fix this, I think Rust needs to add some DWARF extensions, something they’ve managed to avoid so far (admirably!). But that’ll be a harder problem.

If you’d like to learn more about what I’ve done here, the code is a bit of a mess, but have a look at lildb and the program analysis crate, debugdb.

#debugging #lilos #rust

Cliffle

lildb example