Writing a basic `async` debugger
- lildb example
- What’s an await-trace?
- Why can’t I have a stack trace?
- Aside: intrusive tracing
- How do we find the futures in memory?
- What do they look like on the inside?
- Wait… but you don’t know the types!
- How does lildb know to say things like sleeping until?
- Where is all this type information coming from?
- What’s next?
I’m a big fan of Rust’s async
feature, which lets you write explicit state
machines like straight-line code. One of the operating systems I maintain,
lilos
, is almost entirely based on async
, and I think it’s a killer
feature for embedded development.
async
is also popular when writing webservers and other network services. My
colleagues at Oxide use it quite a bit. Watching them work has underscored one
of the current issues with async
, however: the debugging story is not great.
In particular, answering the question “why isn’t my program currently doing
anything” is very hard.
I’ve been quietly tinkering on some tools to improve the situation since 2021,
and I’ve recently released a prototype debugger for lilos
: lildb
. lildb
can print await traces for uninstrumented lilos
programs, which are like
stack traces, but for suspended futures. I wrote this to help me debug my own
programs, but I’m publishing it to try and move the discussion on async
debugging forward. To that end, this post will walk through what it does, how it
derives the information it uses, and areas where we could improve things.
lildb
example
The lildb
repo includes an ELF file and memory image from one of my
embedded applications (which is available for purchase, though that’ll
be the subject of a later post). This ensures you can try it out without having
an appropriate microcontroller board handy. You can start the debugger by
running (from inside the repo):
cargo run --bin lildb demo/keybad-fw
which will present you with a prompt:
Loaded; 1601 types found in program.
To quit: ^D or exit
>>
I find the tool quite useful for offline program analysis even without loading the memory image of a running program; for instance, without further ado, you can use it to visualize Rust’s “niche optimization” for enums:
>> info core::option::Option<&mut [u8; 8]>
core::option::Option<&mut [u8; 8]> <.debug_info+0x000012d4>: enum type
- byte size: 4
- alignment: 4
- 2 variants discriminated by u32 at offset 0
- when discriminator == 0
- contains type: core::option::Option<&mut [u8; 8]>::None <.debug_info+0x00001303>
- at offset: 0 bytes
- aligned: 4 bytes
- any other discriminator value
- contains type: core::option::Option<&mut [u8; 8]>::Some <.debug_info+0x00001314>
- at offset: 0 bytes
- aligned: 4 bytes
Discriminator position:
0 1 2 3
+------+------+------+------+
0000 | DISC |
+------+------+------+------+
>> info core::option::Option<&mut [u8; 8]>::Some
core::option::Option<&mut [u8; 8]>::Some <.debug_info+0x00001314>: struct type (tuple-like)
- byte size: 4
- alignment: 4
- template type parameters:
- T = &mut [u8; 8] <.debug_info+0x00013fe8>
- members:
0. __0: &mut [u8; 8] <.debug_info+0x00013fe8>
- offset: 0 bytes
- size: 4 bytes
- aligned: 4 bytes
0 1 2 3
+------+------+------+------+
0000 | 0 |
+------+------+------+------+
where:
0 = __0: &mut [u8; 8] <.debug_info+0x00013fe8>
There we can see that Option<&mut T>
has the same size as &mut T
, using the
value 0 – which is illegal in a Rust reference – to indicate None
. Handy.
But for debugging async tasks, we need to overlay the memory image of the actual running program:
load demo/memory-image.bin 0x20000000
…which gives us access to more state inspection commands, including normal
debugger things like printing info about a static
variable:
>> var lilos::time::TICK
lilos::time::TICK @ <.debug_info+0x000163b5>
- type: core::sync::atomic::AtomicU32 <.debug_info+0x00016e57>
- address: 0x20000624
- current contents: use core::sync::atomic::AtomicU32 as AtomicU32;
AtomicU32 {
v: core::cell::UnsafeCell<u32> {
value: 1620_u32,
},
}
…but the point of this article is the async
-specific features, such as
getting an await
-trace for all tasks:
>> tasks
current tick-time is: 1620
task 0:
async fn keybad_fw::serial::task
suspended at await point 1
waiting on: async fn lilos::spsc::{impl#4}::pop<keybad_fw::scanner::KeyEvent>
waiting for data in spsc queue at 0x20001d7c
queue type: lilos::spsc::Queue<keybad_fw::scanner::KeyEvent> <.debug_info+0x0000e5b4>
task 1:
async fn keybad_fw::scanner::task
suspended at await point 0
waiting on: async fn lilos::exec::{impl#6}::next_time
suspended at await point 0
waiting on: async fn lilos::exec::sleep_until
sleeping until: 1621 (1 ms from now)
Alright, with the demo out of the way, let’s talk about what just happened!
What’s an await-trace?
An await-trace is similar to a stack-trace, but different. It follows the
structure of async
code that is parked at an await
point, by decomposing
what, exactly, it’s waiting for.
Consider the second await trace from the example above:
async fn keybad_fw::scanner::task
suspended at await point 0
waiting on: async fn lilos::exec::{impl#6}::next_time
suspended at await point 0
waiting on: async fn lilos::exec::sleep_until
sleeping until: 1621 (1 ms from now)
This means:
- The outermost code of the task is an
async fn
calledkeybad_fw::scanner::task
. - That
async fn
contains at least oneawait
in its code. - It is parked at the first one (await point 0).
- At that point in the code, it is awaiting another
async fn
,lilos::exec::{something}::next_time
. (I’ll come back to the{something}
part later.) - That
async fn
is awaiting anotherasync fn
,lilos::exec::sleep_until
… - …which is recognized by the debugger as meaning a time-based sleep (a trick I’ll discuss later).
In general an await-trace can be mapped onto a conventional stack trace, but
will contain a subset of the stack trace’s information: only the points where
one function await
s another. Any intermediate functions that don’t await won’t
show up.
This is sometimes less useful than a conventional stack trace, but it’s still
quite useful. In the example above, we can finally answer the question “why is
my future or async fn
not running right now” – it’s because it’s sleeping,
and will continue to sleep for another millisecond.
Why can’t I have a stack trace?
In short: because the code shown above isn’t actually running. Let me break that down:
A Rust future is just a Rust value that implements the Future
trait. You can
pass a future to a function, return it from a function, put it in a Box
on the
heap, etc.
Because it’s just a value, the future doesn’t have control of the CPU, and in
fact doesn’t “know” when it should execute. This is the responsibility of the
code handling the future, which must grant it CPU time for it to make progress
by periodically calling poll
. (In lilos
the ultimate responsibility for
doing this bubbles up to the executor.)
While the future’s poll
implementation is running, it is effectively borrowing
the CPU from the caller (just like any function does while it’s being called),
and it also borrows the caller’s stack (again, like any function).
When poll
is not running, the future is sitting there as a data structure,
in a state I call suspended. While a future is suspended, it does not have a
stack. It does not have a stack pointer even, or a program counter /
instruction pointer. Anything it might need in a future execution of poll
(such as the values of local variables in an async fn
) is stored inside the
data structure in memory, not on the stack or in registers. Some of the stored
information indicates where to resume when the future is next polled – in the
case of an async fn
, it’s stored in a compact representation that is most
certainly not a program counter value.
This means the traditional stack trace algorithms and debug info break down. If you don’t have a stack, and can’t provide an address in the program where we’re running, you can’t get a conventional stack trace.
This has been the core difficulty in debugging async
Rust programs for, well,
ever.
Aside: intrusive tracing
I’m not the only person working on this problem! The Tokio ecosystem has
produced a number of tools for debugging async programs, but as far as I can
tell they’re all intrusive. This means you have to annotate the program being
debugged to record information about what function or unit of work is running
and who’s awaiting what. In the case of tokio-console
, the program also has to
be running and available to answer questions about its state.
That sort of intrusive trace can be super valuable – I got a lot of use out of the Dapper system at Google, which is very similar – but it doesn’t replace the ability to debug uninstrumented code. Particularly in an embedded context, spending the additional flash space and CPU time on maintaining a parallel data structure describing the program just won’t work.
But even in what I call a “Big Computer,” where you have effectively infinite memory and CPU, intrusive tracing can only trace the things you remembered to annotate – and in my experience you will eventually hit a bug in a thing you didn’t annotate. If the bug is difficult to reproduce, you may only have a core dump, and you may have to diagnose the bug without being able to make further changes to the software.
So, I think we need tools for debugging uninstrumented programs, in addition to tracing facilities.
Alright, back to the debugger!
How do we find the futures in memory?
In order to present an await trace for a future, we need to know where it is. This means peering into the code responsible for polling the future – the async runtime or executor. The runtime may be cooperative or uncooperative.
In a cooperative runtime, there’s some sort of explicit metadata exposed for debuggers, to help them locate tasks (which are the top-level futures managed by the runtime). This is ideal, but requires support from the runtime.
In an uncooperative runtime, we have to use the debug information compiled into the program to take the runtime apart and look for internal private variables. We can do this, because encapsulation doesn’t stop debuggers – but it’s more fragile, because programs generally assume they’re free to change their own private state without warning anyone. (This is the eternal struggle of debugger authors.)
lilos
is cooperative: it exposes a static
variable called
lilos::exec::TASK_FUTURES
, explicitly for consumption by debuggers. We can
look at that variable in the snapshot:
>> var lilos::exec::TASK_FUTURES
lilos::exec::TASK_FUTURES @ <.debug_info+0x00015c32>
- type: core::option::Option<*mut [core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>]> <.debug_info+0x000171b2>
- address: 0x20000600
- current contents: core::option::Option<*mut [core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>]>::Some(*mut [core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>] {
data_ptr: 0x20001fd4 as *_ core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>,
length: 2_u32,
})
…and see that it has the type (deep breath!)
Option<*mut [Pin<*mut dyn Future<Output = Infallible>>]>
.
Whew.
So in lildb
we use this static to guide us to the task futures.
What do they look like on the inside?
Let’s pick an async fn
to take apart, shall we? (We are skipping a very
important detail here, which you might notice if you’ve done a lot of program
analysis – fear not, I’ll cover it in the next section.)
Warning: From this point forward, this discussion is specific to certain versions of the Rust toolchain. I’ve mostly done this work on 1.69 and later. Some earlier versions have less metadata available; with any luck later versions will have more.
Let’s pick lilos::exec::sleep_until
, the innermost awaited frame on one of our
tasks’ await traces. We can look for related types using the list
command in
lildb
– I have trimmed the top part of the output here because it is
irrelevant.
>>> list lilos::exec::sleep_until
...output trimmed...
enum lilos::exec::sleep_until::{async_fn_env#0} <.debug_info+0x0000c842>
struct lilos::exec::sleep_until::{async_fn_env#0}::Panicked <.debug_info+0x0000c8c4>
struct lilos::exec::sleep_until::{async_fn_env#0}::Returned <.debug_info+0x0000c8b1>
struct lilos::exec::sleep_until::{async_fn_env#0}::Suspend0 <.debug_info+0x0000c8d7>
struct lilos::exec::sleep_until::{async_fn_env#0}::Unresumed <.debug_info+0x0000c89e>
This shape is typical for an async fn
in the debug info. An async fn
produces several types:
-
A Rust
enum
(i.e. anenum
that can have fields, as opposed to a C-styleenum
) representing the future. This is the first type in the list. Current compilers name thisname_of_the_async_fn::{async_fn_env#0}
. -
A set of Rust
struct
types that describe the information recorded in the future at each possible suspended state.
What suspended states are possible? A future can suspend at any await
point,
so we have one suspended state for each await
point in the code. The current
compiler calls these states SuspendX
, where X
is an integer that numbers the
await points, starting from 0.
But in addition to those states, are three other states you might not have thought about:
-
Unresumed
is the state of anasync fn
future that has been created (by calling theasync fn
) but has not been polled (byawait
orselect!
or various other means). Polling the future in this state will kick it forward into one of the other states (typically aSuspendX
state). -
Returned
is the state of anasync fn
that has resolved, that is, has been polled and returnedPoll::Ready(result)
from the poll function. Polling the future in this state will panic. -
Panicked
is the state of anasync fn
that has panicked on a previous poll. Polling the future in this state will panic some more.
(Aside: if you’ve noticed that each async fn
adds more text size to your
program than you’d like, this is why: every async fn
contains at least one
panic site, and panic sites are expensive, because people on Big Computers like
to include very wordy string messages in each panic that I can’t turn off.)
So in the list above, we see that there are four states for sleep_until
: a
single await point, plus the three special states.
If we inspect the enum
, we can see that it serves to select between the
various state structs in its variants:
>>> info lilos::exec::sleep_until::{async_fn_env#0}
lilos::exec::sleep_until::{async_fn_env#0} <.debug_info+0x0000c842>: enum type
- byte size: 56
- alignment: 8
- 4 variants discriminated by u8 at offset 52
- when discriminator == 0
- contains type: lilos::exec::sleep_until::{async_fn_env#0}::Unresumed <.debug_info+0x0000c89e>
- at offset: 0 bytes
- aligned: 8 bytes
- when discriminator == 1
- contains type: lilos::exec::sleep_until::{async_fn_env#0}::Returned <.debug_info+0x0000c8b1>
- at offset: 0 bytes
- aligned: 8 bytes
- when discriminator == 2
- contains type: lilos::exec::sleep_until::{async_fn_env#0}::Panicked <.debug_info+0x0000c8c4>
- at offset: 0 bytes
- aligned: 8 bytes
- when discriminator == 3
- contains type: lilos::exec::sleep_until::{async_fn_env#0}::Suspend0 <.debug_info+0x0000c8d7>
- at offset: 0 bytes
- aligned: 8 bytes
Discriminator position:
0 1 2 3
+------+------+------+------+
0000 | body |
+ + + + +
0004 | |
+ + + + +
... padding trimmed for blog post ...
+ + + + +
0030 | |
+------+ + + +
0034 | DISC | body |
+------+------+------+------+
In other words, there’s a single byte at offset 0x34 into the enum that tells us
what state it’s in, and each possible state changes the interpretation of all
the other bytes in the enum. For instance, here’s the Unresumed
state:
>> info lilos::exec::sleep_until::{async_fn_env#0}::Unresumed
lilos::exec::sleep_until::{async_fn_env#0}::Unresumed <.debug_info+0x0000c89e>: struct type
- byte size: 56
- alignment: 8
- members:
0. deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
- offset: 0 bytes
- size: 8 bytes
- aligned: 8 bytes
0 1 2 3
+------+------+------+------+
0000 | 0 |
+ + + + +
0004 | |
+------+------+------+------+
0008 | pad |
+ + + + +
... padding trimmed for blog post ...
+ + + + +
0034 | |
+------+------+------+------+
where:
0 = deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
In the Unresumed
state we only have a single field holding the argument,
deadline
.
(Notice that the Unresumed
struct puts the enum’s discriminator field in
unused pad
space. This is typical of how Rust enum
s get represented in debug
info. Think of it as being overlaid on the enum definition above.)
Whereas if we inspect the future at the first await point, Suspend0
, things
get much fancier:
>> info lilos::exec::sleep_until::{async_fn_env#0}::Suspend0
lilos::exec::sleep_until::{async_fn_env#0}::Suspend0 <.debug_info+0x0000c8d7>: struct type
- byte size: 56
- alignment: 8
- members:
0. deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
- offset: 8 bytes
- size: 8 bytes
- aligned: 8 bytes
1. node: core::pin::Pin<&mut lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001c74>
- offset: 48 bytes
- size: 4 bytes
- aligned: 4 bytes
2. __2: lilos::list::Node<lilos::time::TickTime> <.debug_info+0x0000e737>
- offset: 16 bytes
- size: 24 bytes
- aligned: 8 bytes
3. __awaitee: lilos::list::WaitForDetach<lilos::time::TickTime, lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x0000e7a4>
- offset: 40 bytes
- size: 8 bytes
- aligned: 4 bytes
4. deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
- offset: 8 bytes
- size: 8 bytes
- aligned: 8 bytes
0 1 2 3
+------+------+------+------+
0000 | pad |
+ + + + +
0004 | |
+------+------+------+------+
0008 | 4 |
+ + + + +
000c | |
+------+------+------+------+
0010 | 2 |
+ + + + +
0014 | |
+ + + + +
0018 | |
+ + + + +
001c | |
+ + + + +
0020 | |
+ + + + +
0024 | |
+------+------+------+------+
0028 | 3 |
+ + + + +
002c | |
+------+------+------+------+
0030 | 1 |
+------+------+------+------+
0034 | pad |
+------+------+------+------+
where:
0 = deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
1 = node: core::pin::Pin<&mut lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001c74>
2 = __2: lilos::list::Node<lilos::time::TickTime> <.debug_info+0x0000e737>
3 = __awaitee: lilos::list::WaitForDetach<lilos::time::TickTime, lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x0000e7a4>
4 = deadline: lilos::time::TickTime <.debug_info+0x0000e0dd>
At the first await point, the bytes formerly used for the deadline
at offset 0
have become unused, and instead we have two overlaid copies of the deadline
at
offset 8. (This might seem like a problem, but it’s how the compiler describes
variable shadowing in futures. It works in the end.)
Importantly, we have two fields with weird underscore names that didn’t appear
in the source code: __2
and __awaitee
.
__2
is an anonymous local value. In a normal fn
this would temporarily
appear in the stack or registers and be consumed later in the function. But
since we’re an async fn
, and we need it to survive across the await point,
it’s granted a name by the compiler and stored in a field in the struct.
__awaitee
is the name given by current compilers to the future that is being
await
-ed. This is what enables an await trace, fundamentally: to go into the
next thing being await-ed, we need to recursively inspect __awaitee
fields
until we hit something we don’t understand.
In this case the type of __awaitee
is lilos::list::......
… very long.
lildb
lets you use the DWARF debuginfo offset as a shorthand, it’s printed
after every type. Let’s use that to query the shape of our __awaitee
:
>> info <.debug_info+0x0000e7a4>
lilos::list::WaitForDetach<lilos::time::TickTime, lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x0000e7a4>: struct type
- byte size: 8
- alignment: 4
- template type parameters:
- T = lilos::time::TickTime <.debug_info+0x0000e0dd>
- F = lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime> <.debug_info+0x0000e7e9>
- members:
0. node: core::pin::Pin<&lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001cb7>
- offset: 0 bytes
- size: 4 bytes
- aligned: 4 bytes
1. polled_since_detach: core::cell::Cell<bool> <.debug_info+0x000005cf>
- offset: 4 bytes
- size: 1 bytes
- aligned: 1 bytes
2. cleanup: core::option::Option<lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x00000960>
- offset: 5 bytes
- size: 1 bytes
- aligned: 1 bytes
0 1 2 3
+------+------+------+------+
0000 | 0 |
+------+------+------+------+
0004 | 1 | 2 | pad |
+------+------+------+------+
where:
0 = node: core::pin::Pin<&lilos::list::Node<lilos::time::TickTime>> <.debug_info+0x00001cb7>
1 = polled_since_detach: core::cell::Cell<bool> <.debug_info+0x000005cf>
2 = cleanup: core::option::Option<lilos::list::{impl#5}::insert_and_wait::{closure_env#0}<lilos::time::TickTime>> <.debug_info+0x00000960>
This isn’t an enum
– it’s just a normal struct
with human-readable names.
That’s because the thing we’re awaiting here is not an async fn
. It is
instead a struct
that someone (me) has implemented Future
on.
This means that, for the simple “follow the __awaitee
” async trace method
I summarized above, we have reached the end of the line. (There are tricks for
pushing past this, which I’ll discuss below.)
Wait… but you don’t know the types!
We started looking at lilos::exec::sleep_until
above because it was named in
the await trace. To follow the await trace, we need to know the concrete types
involved, so we can query their debug info like we did above.
But if you recall the type of lilos::exec::TASK_FUTURES
, it turns out we’ve
used some circular reasoning: we looked into that type because it was mentioned
in the await trace, which is how we printed the await trace.
But how do we actually figure out what type the future is from TASK_FUTURES
,
given that it stores *mut dyn Future<...>
– an erased type? If we didn’t know
anything about what the tasks were doing, how could we get started building an
await trace if the concrete type isn’t available?
The answer: there’s a trick. (I’ll be saying this a lot for the rest of the article. Debuggers are mostly made out of tricks.)
Earlier when we ran var lilos::exec::TASK_FUTURES
, lildb
indicated that the
contents were (slightly cleaned up for print):
Option::Some(*mut [Pin<*mut dyn Future<Output=Infallible>>] {
data_ptr: 0x20001fd4 as *_ Pin<*mut dyn Future<Output=Infallible>>,
length: 2_u32,
})
This is an Option
of a slice. The debug info currently generated by the Rust
toolchain for a slice calls the pointer part of the slice data_ptr
, and the
length part length
. So, here, we have a slice of pinned pointers to futures
starting at address 0x2000_1fd4
and continuing for 2 elements.
We can paste the pointer value – including the as blah blah blah
part – into
lildb
’s decode
command to look at the first pointer in the slice (output
slightly trimmed for this post):
>> decode 0x20001fd4 as *_ core::pin::Pin<*mut dyn core::future::future::Future<Output=core::convert::Infallible>>
Pin<*mut dyn Future<Output=Infallible>> <.debug_info+0x00001fef>:
Pin<*mut dyn Future<Output=Infallible>> {
pointer: 0x20001da0 as &mut keybad_fw::serial::task::{async_fn_env#0} as &mut dyn Future,
}
So, the pinned pointer is a struct containing a single field called pointer
,
which contains the actual pointer. (That matches the definition of Pin
in the
standard library, if you read it.)
The actual pointer points to address 0x2000_1da0
and lildb
says it’s … a
specific concrete type? Cast to a &dyn
? Rust values don’t have any kind of
header indicating their type, so, the debugger has clearly used that trick I
warned you about.
Here’s the trick. *mut dyn SomeTrait
is a fat pointer in Rust, which
consists of a conventional pointer plus one pointer’s worth of metadata. In
the case of dyn
, that metadata points to a vtable structure in the program.
decode
hides these details, but we can see them if we use info
to follow the
types:
>> info <.debug_info+0x0001414a>
*mut dyn core::future::future::Future<Output=core::convert::Infallible> <.debug_info+0x0001414a>: struct type
- byte size: 8
- alignment: 4
- members:
0. pointer: *_ dyn core::future::future::Future<Output=core::convert::Infallible> <.debug_info+0x00017c06>
- offset: 0 bytes
- size: 4 bytes
- aligned: 4 bytes
1. vtable: &[usize; 3] <.debug_info+0x00017c16>
- offset: 4 bytes
- size: 4 bytes
- aligned: 4 bytes
0 1 2 3
+------+------+------+------+
0000 | 0 |
+------+------+------+------+
0004 | 1 |
+------+------+------+------+
where:
0 = pointer: *_ dyn core::future::future::Future<Output=core::convert::Infallible> <.debug_info+0x00017c06>
1 = vtable: &[usize; 3] <.debug_info+0x00017c16>
Unfortunately, the Rust compiler currently doesn’t generate DWARF to describe
what type the vtable belongs to. But it does name the vtable consistently. We
can discover this if we decode
the address of our fat pointer as a usize
.
>>> decode 0x20001fd4 as *mut usize
usize <.debug_info+0x00000081>:
536878496_u32
>> decode 0x20001fd8 as *mut usize
usize <.debug_info+0x00000081>:
134233152_u32
Pointer value is 536878496 decimal, which is what we had above, and the
metadata word is 134233152 decimal. We can look up information about that
metadata address with the addr
command.
>>> addr 134233152
Offset +0x0 into static <keybad_fw::serial::task::{async_fn_env#0} as core::future::future::Future>::{vtable}
- range 0x8003c40..0x8003c50
- type <keybad_fw::serial::task::{async_fn_env#0} as core::future::future::Future>::{vtable_type} <.debug_info+0x0001296e>
- .drop_in_place +0x0 (in <keybad_fw::serial::task::{async_fn_env#0} as core::future::future::Future>::{vtable_type})
So the metadata pointer points to a static value of a type whose name is
structured as <name-of-our-async-fn-env as Future>::{vtable}
.
lildb
recognizes this pattern and decodes the name to find the concrete type
(left hand side of the as
).
This is potentially fragile, but without more information in the debuginfo, this is what we’ve got. And it works – quite well.
So, with this vtable lookup trick, we can de-erase the type of any dyn
fat
pointer, a trick that lildb
uses to handle type-erased futures.
How does lildb
know to say things like sleeping until
?
More tricks!
In any async runtime, an await trace will eventually come down to some “primitive” operation. These operations are often things like sleeping, or waiting to pop an item from a queue.
If you have intimate knowledge of the runtime, you can pattern-match these and
provide more detailed human-readable descriptions, rather than printing a long
cryptic generated type name. This is what lildb
does: it recognizes the
lilos::exec::sleep_until
function and applies a special heuristic for decoding
it.
Like other debugger tricks, this is potentially fragile. If lildb
finds a
sleep_until
future that doesn’t match the patterns it expects (like if lilos
were to change sleep_until
in a very fundamental way), it will fall back to a
generic printing routine. That’s the best we can do without updating our
heuristics.
Where is all this type information coming from?
On platforms other than Windows, Rust generates debug info in a format called DWARF. There’s quite a lot of it – to the point that the Rust team keeps working on making it smaller. There’s a wealth of information contained in the DWARF data.
It’s also a giant pain to do anything with. The relationships between its entities aren’t well documented (or when they are documented, they’re violated by real compilers), there are important pieces of logic left totally implicit (like how to unify equivalent types across compilaton units), and almost everything is optional, which complicates the data structures and limits the things we can rely on.
I wrote a DWARF parser and analyzer for lildb
, because I couldn’t find an
off-the-shelf one that met my needs. (At my day job we’d previously written one
for Humility, the Hubris debugger; lildb
’s parser takes a somewhat different
approach.) It’s available as a crate if you’d like to do something weird.
There’s a bunch of information in the DWARF that lildb
is not currently
exposing, because I haven’t plumbed it all the way through. In particular, there
is enough information in the DWARF to print source file name and line number for
each await point, instead of just saying “await point 0.” (It’s in a weird
place, but, it is there.) I hope to expose this soon.
What’s next?
I’ll continue tinkering on lildb
to meet my embedded needs in my personal
projects. I think it’d be interesting to try to extend these techniques to
Tokio, the async runtime we use at work; this will be harder, because Tokio
keeps its task futures in complex dynamic data structures, rather than a simple
slice like lilos
. But, the techniques should generalize.
I’m collecting a set of concrete requests for Rust DWARF info. Several of the
heuristics lildb
applies could be removed if rustc
generated more
information – using existing DWARF attributes. Like linking vtables to their
owning type. Or representing traits at all. (Traits are basically missing from
the current DWARF.)
But fundamentally, DWARF was not designed for this. The concept of taking a function and magically converting it into a co-routine style state machine wasn’t really a thing when DWARF was designed, and its metadata – line number tables, variable location information, and the like – is very specialized around the idea of a program running on a stack and registers. This makes it difficult to e.g. add information to the program that maps the stack of a currently polled future back to its in-memory suspended representation, at least using standard DWARF attributes.
To fix this, I think Rust needs to add some DWARF extensions, something they’ve managed to avoid so far (admirably!). But that’ll be a harder problem.
If you’d like to learn more about what I’ve done here, the code is a bit of a
mess, but have a look at lildb
and the program analysis crate, debugdb
.