I’m trying to do something kind of unusual with lilos: in addition to almost
all the APIs being safe-in-the-Rust sense, I’m also attempting to create an
entire system API that is cancel-safe. I’ve written a lot about Rust’s async
feature and its notion of cancellation recently, such as my suggestion for
reframing how we think about async/await.
My thoughts on this actually stem from my early work on lilos, where I started
beating the drum of cancel-safety back in 2020. My notion
of what it means to be cancel-safe has gotten more nuanced since then, and I’ve
recently made the latest batch of changes to try to help applications built on
lilos be more robust by default.
So, wanna nerd out about async API design and robustness? I know you do.
One of the nice things about the Rust programming language is that it
makes it easier to write correct concurrent (e.g. threaded) programs – to the
degree that Rust’s slogan has been, at times, “fearless concurrency.”
But I’d like to tell you about the other side of Rust, which I think is
under-appreciated. Rust enables you to write programs that are not concurrent.
This feature is missing from most other languages, and is a source of much
complexity and bugs.
“But wait,” you might be saying, “of course I can write code that isn’t
concurrent in Java or Python or C!”
Can you, though? You can certainly write code that ignores concurrency, and
would malfunction if (say) used from multiple threads simultaneously. But that’s
not the same thing as writing code that isn’t concurrent – code that simply
can’t be used concurrently, by compiler guarantee.
In Rust, you can. Let’s look at why you can do it, and why it’s awesome.
One of the common complaints I hear from systems programmers who try Rust is
about mutexes, and specifically about the Rust Mutex API. The complaints
usually go something like this:
They don’t want the mutex to contain data, just a lock.
They don’t want to have to manage a “guard” value that unlocks the mutex on
drop – often, more specifically, they just want to call an unlock operation
because they feel like that’s more explicit.
These changes would make the Rust mutex API equivalent to the C/Posix mutex API.
In one case I’ve seen someone try to use Mutex<()> and trickery to fake it.
There’s a problem with this, though: these two aspects of Mutex’s design are
inextricably linked to one another, and to Rust’s broader safety guarantees –
changing either or both of them will open the door to subtle bugs and
corruption due to data races.
A C-style mutex API consisting of some bundle of implicitly guarded data, plus
lock and unlock functions, isn’t wise in Rust because it allows safe code to
easily commit errors that break memory safety and create data races.
Perhaps controversially, I’d argue that this is also true in C. It’s just more
obvious in Rust, because Rust rigorously distinguishes between the notion of
“safe” code that cannot commit such errors, and “unsafe” code that can commit
such errors if it wishes. C does not make this distinction, and as a result, any
code using a mutex in C can trivially produce serious, potentially exploitable,
In the rest of this post I’ll walk through a typical C mutex API, compare with a
typical Rust mutex API, and look at what happens if we change the Rust API to
resemble C in various ways.