I’m a big fan of Rust’s async feature, which lets you write explicit state
machines like straight-line code. One of the operating systems I maintain,
lilos, is almost entirely based on async, and I think it’s a killer
feature for embedded development.
async is also popular when writing webservers and other network services. My
colleagues at Oxide use it quite a bit. Watching them work has underscored one
of the current issues with async, however: the debugging story is not great.
In particular, answering the question “why isn’t my program currently doing
anything” is very hard.
I’ve been quietly tinkering on some tools to improve the situation since 2021,
and I’ve recently released a prototype debugger for lilos: lildb. lildb
can print await traces for uninstrumented lilos programs, which are like
stack traces, but for suspended futures. I wrote this to help me debug my own
programs, but I’m publishing it to try and move the discussion on async
debugging forward. To that end, this post will walk through what it does, how it
derives the information it uses, and areas where we could improve things.
One of the nice things about the Rust programming language is that it
makes it easier to write correct concurrent (e.g. threaded) programs – to the
degree that Rust’s slogan has been, at times, “fearless concurrency.”
But I’d like to tell you about the other side of Rust, which I think is
under-appreciated. Rust enables you to write programs that are not concurrent.
This feature is missing from most other languages, and is a source of much
complexity and bugs.
“But wait,” you might be saying, “of course I can write code that isn’t
concurrent in Java or Python or C!”
Can you, though? You can certainly write code that ignores concurrency, and
would malfunction if (say) used from multiple threads simultaneously. But that’s
not the same thing as writing code that isn’t concurrent – code that simply
can’t be used concurrently, by compiler guarantee.
In Rust, you can. Let’s look at why you can do it, and why it’s awesome.
One of the common complaints I hear from systems programmers who try Rust is
about mutexes, and specifically about the Rust Mutex API. The complaints
usually go something like this:
They don’t want the mutex to contain data, just a lock.
They don’t want to have to manage a “guard” value that unlocks the mutex on
drop – often, more specifically, they just want to call an unlock operation
because they feel like that’s more explicit.
These changes would make the Rust mutex API equivalent to the C/Posix mutex API.
In one case I’ve seen someone try to use Mutex<()> and trickery to fake it.
There’s a problem with this, though: these two aspects of Mutex’s design are
inextricably linked to one another, and to Rust’s broader safety guarantees –
changing either or both of them will open the door to subtle bugs and
corruption due to data races.
A C-style mutex API consisting of some bundle of implicitly guarded data, plus
lock and unlock functions, isn’t wise in Rust because it allows safe code to
easily commit errors that break memory safety and create data races.
Perhaps controversially, I’d argue that this is also true in C. It’s just more
obvious in Rust, because Rust rigorously distinguishes between the notion of
“safe” code that cannot commit such errors, and “unsafe” code that can commit
such errors if it wishes. C does not make this distinction, and as a result, any
code using a mutex in C can trivially produce serious, potentially exploitable,
bugs.
In the rest of this post I’ll walk through a typical C mutex API, compare with a
typical Rust mutex API, and look at what happens if we change the Rust API to
resemble C in various ways.
At some point in the past… I dunno, two years or so, it appears that my RSS
feeds broke.
I use Zola to generate this site, and they don’t have much in the way of a
cross-version compatibility guarantee – minor version updates routinely break
my templates. (I’m currently stuck on an older version because of this
bug.) They appear to have changed
the names of the RSS-related settings, causing my detection for generate_rss
to always return false (because they also seem to default any typo’d
configuration key to false). Whee.
Anyway, should be back on now – thanks to all the folks who have asked about
this.
Last week I gave a talk at the Open Source Firmware Conference about some of the
work I’m doing at Oxide Computer, entitled On Hubris and Humility. There is a
video of the talk if you’d like to
watch it in video form. It came out pretty alright!
The conference version of the talk has a constantly animated background that
makes the video hard for some people to watch. OSFC doesn’t appear to be
bothering with either captions or transcripts, so my friends who don’t hear as
well as I do (or just don’t want to turn their speakers on!) are kind of out of
luck.
And so, here’s a transcript with my slides inlined. The words may not exactly
match the audio because this is written from my speaker’s notes. And, yes, my
slides are all character art. The browser rendering is imperfect.
I’ve also written an epilogue at the end after the initial response to the talk.