Blog

Writing a basic `async` debugger

I’m a big fan of Rust’s async feature, which lets you write explicit state machines like straight-line code. One of the operating systems I maintain, lilos, is almost entirely based on async, and I think it’s a killer feature for embedded development.

async is also popular when writing webservers and other network services. My colleagues at Oxide use it quite a bit. Watching them work has underscored one of the current issues with async, however: the debugging story is not great. In particular, answering the question “why isn’t my program currently doing anything” is very hard.

I’ve been quietly tinkering on some tools to improve the situation since 2021, and I’ve recently released a prototype debugger for lilos: lildb. lildb can print await traces for uninstrumented lilos programs, which are like stack traces, but for suspended futures. I wrote this to help me debug my own programs, but I’m publishing it to try and move the discussion on async debugging forward. To that end, this post will walk through what it does, how it derives the information it uses, and areas where we could improve things.

Safely writing code that isn't thread-safe

One of the nice things about the Rust programming language is that it makes it easier to write correct concurrent (e.g. threaded) programs – to the degree that Rust’s slogan has been, at times, “fearless concurrency.”

But I’d like to tell you about the other side of Rust, which I think is under-appreciated. Rust enables you to write programs that are not concurrent. This feature is missing from most other languages, and is a source of much complexity and bugs.

“But wait,” you might be saying, “of course I can write code that isn’t concurrent in Java or Python or C!”

Can you, though? You can certainly write code that ignores concurrency, and would malfunction if (say) used from multiple threads simultaneously. But that’s not the same thing as writing code that isn’t concurrent – code that simply can’t be used concurrently, by compiler guarantee.

In Rust, you can. Let’s look at why you can do it, and why it’s awesome.

Why Rust mutexes look like they do

One of the common complaints I hear from systems programmers who try Rust is about mutexes, and specifically about the Rust Mutex API. The complaints usually go something like this:

  • They don’t want the mutex to contain data, just a lock.
  • They don’t want to have to manage a “guard” value that unlocks the mutex on drop – often, more specifically, they just want to call an unlock operation because they feel like that’s more explicit.

These changes would make the Rust mutex API equivalent to the C/Posix mutex API. In one case I’ve seen someone try to use Mutex<()> and trickery to fake it.

There’s a problem with this, though: these two aspects of Mutex’s design are inextricably linked to one another, and to Rust’s broader safety guarantees – changing either or both of them will open the door to subtle bugs and corruption due to data races.

A C-style mutex API consisting of some bundle of implicitly guarded data, plus lock and unlock functions, isn’t wise in Rust because it allows safe code to easily commit errors that break memory safety and create data races.

Perhaps controversially, I’d argue that this is also true in C. It’s just more obvious in Rust, because Rust rigorously distinguishes between the notion of “safe” code that cannot commit such errors, and “unsafe” code that can commit such errors if it wishes. C does not make this distinction, and as a result, any code using a mutex in C can trivially produce serious, potentially exploitable, bugs.

In the rest of this post I’ll walk through a typical C mutex API, compare with a typical Rust mutex API, and look at what happens if we change the Rust API to resemble C in various ways.

RSS Feed Back On

At some point in the past… I dunno, two years or so, it appears that my RSS feeds broke.

I use Zola to generate this site, and they don’t have much in the way of a cross-version compatibility guarantee – minor version updates routinely break my templates. (I’m currently stuck on an older version because of this bug.) They appear to have changed the names of the RSS-related settings, causing my detection for generate_rss to always return false (because they also seem to default any typo’d configuration key to false). Whee.

Anyway, should be back on now – thanks to all the folks who have asked about this.

On Hubris And Humility

Last week I gave a talk at the Open Source Firmware Conference about some of the work I’m doing at Oxide Computer, entitled On Hubris and Humility. There is a video of the talk if you’d like to watch it in video form. It came out pretty alright!

The conference version of the talk has a constantly animated background that makes the video hard for some people to watch. OSFC doesn’t appear to be bothering with either captions or transcripts, so my friends who don’t hear as well as I do (or just don’t want to turn their speakers on!) are kind of out of luck.

And so, here’s a transcript with my slides inlined. The words may not exactly match the audio because this is written from my speaker’s notes. And, yes, my slides are all character art. The browser rendering is imperfect.

I’ve also written an epilogue at the end after the initial response to the talk.