I recently posted about my debugger for async Rust, which can
generate what I call “await-traces” for async code that’s suspended and not
currently running. I mentioned at the time that it appeared possible to get the
source code file name and line number corresponding to the await points, but
left that for future work.
In this post I’ll work through an example of why I’m so excited about this
technique, by building a real driver for a notoriously tricky bus one piece at a
time, using lilos.
(This is a section of the lilos intro guide that people seemed to like, so
to increase its visibility, I’m lifting it up into its own post and expanding it
a bit. I hope this is a useful companion piece to the post on async
debugging I posted this morning.))
Some documentation of Rust async and await has presented it as a seamless
alternative to threads. Just sprinkle these keywords through your code and get
concurrency that scales better! I think this is very misleading. An async fn
is a different thing from a normal Rust fn, and you need to think about
different things to write correct code in each case.
This post presents a different way of looking at async that I think is more
useful, and less likely to lead to cancellation-related bugs.
I’m a big fan of Rust’s async feature, which lets you write explicit state
machines like straight-line code. One of the operating systems I maintain,
lilos, is almost entirely based on async, and I think it’s a killer
feature for embedded development.
async is also popular when writing webservers and other network services. My
colleagues at Oxide use it quite a bit. Watching them work has underscored one
of the current issues with async, however: the debugging story is not great.
In particular, answering the question “why isn’t my program currently doing
anything” is very hard.
I’ve been quietly tinkering on some tools to improve the situation since 2021,
and I’ve recently released a prototype debugger for lilos: lildb. lildb
can print await traces for uninstrumented lilos programs, which are like
stack traces, but for suspended futures. I wrote this to help me debug my own
programs, but I’m publishing it to try and move the discussion on async
debugging forward. To that end, this post will walk through what it does, how it
derives the information it uses, and areas where we could improve things.
One of the nice things about the Rust programming language is that it
makes it easier to write correct concurrent (e.g. threaded) programs – to the
degree that Rust’s slogan has been, at times, “fearless concurrency.”
But I’d like to tell you about the other side of Rust, which I think is
under-appreciated. Rust enables you to write programs that are not concurrent.
This feature is missing from most other languages, and is a source of much
complexity and bugs.
“But wait,” you might be saying, “of course I can write code that isn’t
concurrent in Java or Python or C!”
Can you, though? You can certainly write code that ignores concurrency, and
would malfunction if (say) used from multiple threads simultaneously. But that’s
not the same thing as writing code that isn’t concurrent – code that simply
can’t be used concurrently, by compiler guarantee.
In Rust, you can. Let’s look at why you can do it, and why it’s awesome.