Why Rust mutexes look like they do
Comments are not a concurrency strategy.
One of the common complaints I hear from systems programmers who try Rust is about mutexes, and specifically about the Rust Mutex API. The complaints usually go something like this:
- They don’t want the mutex to contain data, just a lock.
- They don’t want to have to manage a “guard” value that unlocks the mutex on
drop – often, more specifically, they just want to call an
unlock
operation because they feel like that’s more explicit.
These changes would make the Rust mutex API equivalent to the C/Posix mutex API.
In one case I’ve seen someone try to use Mutex<()>
and trickery to fake it.
There’s a problem with this, though: these two aspects of Mutex
’s design are
inextricably linked to one another, and to Rust’s broader safety guarantees –
changing either or both of them will open the door to subtle bugs and
corruption due to data races.
A C-style mutex API consisting of some bundle of implicitly guarded data, plus
lock
and unlock
functions, isn’t wise in Rust because it allows safe code to
easily commit errors that break memory safety and create data races.
Perhaps controversially, I’d argue that this is also true in C. It’s just more obvious in Rust, because Rust rigorously distinguishes between the notion of “safe” code that cannot commit such errors, and “unsafe” code that can commit such errors if it wishes. C does not make this distinction, and as a result, any code using a mutex in C can trivially produce serious, potentially exploitable, bugs.
In the rest of this post I’ll walk through a typical C mutex API, compare with a typical Rust mutex API, and look at what happens if we change the Rust API to resemble C in various ways.
Mutexes in C
(Note: as usual, when I say “C,” my comments should be taken to also apply to C variants such as C++, which use essentially the same mutex design.)
There are a wide variety of mutex APIs in C, largely because the language didn’t
specify a standard one until 2011. I’ll use the C11 standard mutex for this post
because it’s simple and universally available, but this description applies just
as well to (for example) pthreads
.
The C mutex API, for the purposes of this post, consists of two primary
operations: lock
and unlock
1.
// Locks a mutex, blocking if necessary until it becomes free.
int ;
// Unlocks a mutex.
int ;
For this purpose I’m ignoring: mutex creation and destruction,
things like trylock
, mutex attributes, the distinction between recursive and
nonrecursive mutexes, etc. None of these have any bearing on the point I’m
making.
These functions follow the normal C convention of returning int
, with 0
indicating success and anything else indicating failure.
When code wants to safely access data that might be shared across threads, it
calls mtx_lock
. It then accesses the data, before finally calling
mtx_unlock
. Here is a simple example, where these operations are
used to maintain a global counter that can be incremented from multiple threads:
mtx_t *the_mutex;
int the_counter;
// Code to initialize the_mutex omitted.
int
// Note that this function reads into an "out-parameter", *value_out,
// because our return value is used to indicate success/failure.
int
In fancier cases, a system might use more granular mutexes that are stored in data structures alongside the data they protect, as in:
;
To determine which data is intended to be protected by which mutex, C programmers typically use documentation conventions, like this one from the Chromium mutex docs:
Every shared variable/field should have a comment indicating which mutex protects it:
int accesses_; // count of accesses (guarded by mu_)
or a comment explaining why no mutex is needed:
int table_size_; // no. of elements in table (readonly after init)
Every mutex should have a comment indicating which variables and also any non-obvious invariants it protects:
Lock mu_; // protects accesses_, list_, count_ // invariant: count_ == number of elements in linked-list list_
Think of the matching comments on variables and mutexes as analogous to matching types on procedure arguments and parameters; the redundancy can be very helpful to later maintainers of the code.
While I’m trying to present this section without judgment, I can’t quite skip
past that last paragraph with my mouth closed. There’s an important distinction
between procedure argument types and mutex scope comments, which is that
procedure argument types are checked by the compiler. This is closer to
declaring all your parameters void *
, stating their real types in the
comments, and expecting your users to always get the casts and order right.
But I digress. Let’s go look at Rust for comparison.
Mutexes in Rust
Rust provides a Mutex
type in the standard library’s std::sync
module. The API differs from C in three ways:
Mutex
contains the data it guards: the full name of the type isMutex<T>
, for some guarded typeT
that you choose.- The
lock
operation returns a “guard” value. - The
unlock
operation is only available on the guard value, not onMutex
itself. (Theunlock
operation also happens to bedrop
, which we’ll consider in more detail later.)
In the bare-metal no_std
environments where I typically work, we have our own
Mutex
types, but they look pretty much the same as the standard library
Mutex
– for good reasons, which should become apparent over the course of
this post.
Concretely, a simplified version of the Rust API looks like this:
// A mutex guarding some data of type T.
// The result of locking a mutex with lifetime 'a, guarding
// data of type T. Note that this does not implement Copy or
// Clone, so it cannot be duplicated.
(I am ignoring a concept called “lock poisoning” for this simplified API, because it’s not relevant to my point.)
And an example of our counter increment API rewritten using a Rust mutex:
// The Rust version doesn't use global variables because doing so would
// distract from my point by requiring some unsafe.
The way this kind of API is usually described, the MutexGuard
type is a smart
pointer that allows access to the mutex contents of type T
, but only while
the guard itself exists. When it is dropped explicitly, or goes out of scope,
access ends and the mutex unlocks.
But another way of looking at it is: a MutexGuard
is a token that proves
that the mutex has been locked.
-
Because you cannot2 create a
MutexGuard
except by theMutex::lock
operation, holding aMutexGuard
demonstrates thatlock
has been called. -
Because the
Mutex
, by definition, will not hand out more than oneMutexGuard
– a second call tolock
while aMutexGuard
exists will block until the first one is destroyed, andMutexGuard
itself cannot be duplicated – holding aMutexGuard
demonstrates unique access to the data guarded by the mutex. Which, in Rust terms, means that you can get a&mut T
out of it. -
Because the lifetime parameter
'a
in the definition ofMutexGuard
gets tied to the lifetime of theMutex
itself when you calllock
, the compiler won’t let you drop theMutex
while still holding theMutexGuard
, keeping it from turning into a dangling pointer.
By which I mean, you cannot do so at all in safe Rust, and
you can’t easily do so accidentally in unsafe Rust. You can, of course, go
out of your way to break any language invariant in unsafe Rust. I am
attempting to make software that is robust against mistakes by
well-intentioned programmers. If you expect to have evil tricky programmers
working in your codebase, you’ll want to disable unsafe Rust using the
#![forbid(unsafe_code)]
attribute. And then possibly review your hiring
practices.
Variations of the Rust mutex API and their problems
As I summarized at the top of this post, there are two main objections I hear to the Rust mutex API.
- I don’t want the data guarded by the mutex to live inside the mutex.
- I don’t want to use a guard value to track the mutex being locked (usually
with the implication that an
unlock
function should be available).
Let’s try these variations!
Moving guarded data outside the mutex
You can try this one today! Just put nothing inside the mutex, and instead store it alongside some guarded data, like we would in C:
(If you’ve spent enough time in Rust to be familiar with the term “interior mutability” you may see a problem with this definition – shhh, no spoilers.)
When we do this, we immediately give up one thing: any mutex help from the
compiler. We can now freely poke the_data
without locking the mutex.
Presumably at that point we’d add comments like Chromium’s, explaining how to
use the SomeData
struct correctly.
But that means that anyone using this API who fails to read the comment (or misunderstands it) will be able to introduce data races, just like in C – right?
Surprisingly, the answer is: no, this struct still can’t be used to produce data races in safe Rust, even if you write code like this:
That sure looks like a textbook data race, and the code compiles without issue
– but it turns out this isn’t a data race, because we’ve lost something with
this change: the ability to update SomeData
from multiple threads.
We are able to share a SomeData
across threads, because SomeData
automatically implements Sync
. Sync
is the standard Rust trait that
indicates that something can be safely shared across threads – its name implies
that the shared-thing does some sort of sync-hronization. Sync
is
automatically inferred for types that meet some basic criteria, one of which is
that their contents must all also be Sync
, which in this case is true.
But sharing across threads doesn’t mean mutating, and if the_data
is
effectively constant, there’s no longer a data race implied by reading it
without locking3.
Assuming, for the purposes of this post, that you’ve done appropriate barriers if you’re on a weak-memory-model multicore processor. Chances are, you only have to worry about that if you know what it means.
Now, an i32
is a simple machine type that happens to have an atomic
counterpart, AtomicI32
(from std::sync::atomic
). That type is Sync
and
provides atomic operations for updating it from many threads – though at that
point, you probably don’t need the mutex! In other words, AtomicI32
provides
interior mutability – its API allows its contents to change even if you only
have a shared reference to it.
If the guarded data is more complex than an integer – say, it’s a collection of
integers and pointers, and you want to keep them internally consistent – then
it won’t have an atomic counterpart, so if we want to be able to mutate it
through a shared reference, we need to put it inside some sort of container that
manages access rigorously enough to be Sync
even though its contents are
mutable.
Like, say, std::sync::Mutex
.
But this section is about not putting shared data inside thread-safe
containers like Mutex
, so, let’s keep thinking.
There is, in fact, another way.
You could carefully encapsulate SomeData
in a module, keeping its fields
private so that code outside the module can’t reference the_data
directly. You
could then provide functions for operating on SomeData
that are careful to
manage the mutex correctly. In fact, to make the point, you could stop (ab)using
Mutex<()>
and switch to AtomicBool
.
// Assume we are in a module separate from any client code.
use ;
// Struct is pub; fields are not.
But we still haven’t gained the ability to update the_data
with a shared
reference, &SomeData
, which is all we’ll have once it’s shared across threads.
This is because Mutex
actually plays two roles on behalf of its guarded
data: it provides synchronization, yes, but it also provides interior
mutability, giving the ability to write the data through a shared reference. In
other words, it is both a lock and a container like Cell
or RefCell
.
However, both Cell
and RefCell
aren’t Sync
(because they lack the
thread-safe locking part of Mutex
), so you can’t use one of those types to
wrap the_data
or we’ll lose the ability to share it across threads at all.
Instead, you have to drop down a level and use the type that Cell
, RefCell
,
and Mutex
all use under the hood: UnsafeCell
. As its name implies, we’re
about to grow more unsafe code.
// Assume we are in a module separate from any client code.
use UnsafeCell;
use ;
// Struct is pub; fields are not.
// Declare to the compiler that we're sure this can now be
// shared across threads.
unsafe
We’ve had to add an unsafe impl
of Sync
. This asserts to the compiler that
we meet the criteria to be treated as Sync
…without checks. This is the only
way to implement Sync
manually, because all the checked ways of implementing
Sync
happen automatically.
With that change, we can now update our shared data across threads. We’re getting closer to what we wanted.
However, we’ve also reimplemented most of Mutex
… poorly. What we’ve got here
is an equivalent to Mutex
that
- Only supports one kind of guarded data – so if you need a second one you’ll be writing all this again.
- Can’t give you a reference to guarded data, so all updates have to be implemented in this module, and done by-copy.
- Doesn’t support blocking, because blocking mutexes typically require OS support, and we’ve chosen to write our own instead.
Let’s try and fix the top two points there by adding a try_lock
operation
that produces a reference, and making the type generic:
unsafe
This is looking more like the standard Mutex
type, only with fewer features.
In particular, as written, there’s no way to unlock
.
A Mutex
-like thing that cannot be unlocked can still be useful – it’s the
basis for what I call the First-Mover Allocator Pattern, which
uses almost exactly the code above. However, it’s not much of a Mutex
. At this
point you’ve got two options. You can implement unlock using a guard type, at
which point you really have recreated std::sync::Mutex
, or you can fall into
the trap described in the next section.
Unlock is unsafe.
What if we removed MutexGuard
from the standard Mutex
and instead provided
an unlock
operation, as in C?
Here’s a sketch of how that might look, if we leave the guarded data inside the mutex (and thus avoid the issues described in the previous section):
Instead of returning some fancy resource-managing MutexGuard
type, this just
returns an exclusive reference to the guarded data, &mut T
.
In this API, as in C, it’s legal to call unlock
any time you have access to
the Mutex2
. This, in turn, means that there is no way to ensure that you only
use the reference to the guarded data before you unlock:
let guarded_data = mutex.lock;
guarded_data.do_stuff;
mutex.unlock;
guarded_data.do_stuff; // uh oh still in scope
It also means we’ve built a tool for manufacturing &mut
references that alias,
which is another way of violating memory safety:
let guarded_data = mutex.lock;
mutex.unlock;
let guarded_data2 = mutex.lock;
*guarded_data = *guarded_data2; // uh oh they alias
Basically, uncontrolled unlock
loses the ability to reason about whether any
references to guarded data remain available, and gives safe code the ability to
provoke arbitrary data races. That’s exactly what the mutex was trying to
prevent.
You can provide a C-style unlock
operation on a mutex in Rust, but it needs to
be unsafe
– because the caller needs to ensure things the compiler can’t,
like calls to unlock
pairing one-to-one with calls to lock
, and references
to guarded data not escaping beyond the unlock
.
However, for the Mutex2
type I sketched above, that basically means the mutex
is useless for safe code – most code using a mutex probably wants to be able to
unlock it! We’ve run back into the issue from the previous section.
To fix this, we need to make unlock
safe, and for it to be safe, we need to
have some way of preventing access to unlock
except for exactly one unlock
call after the mutex has been locked, and after any references to guarded
data have been disposed of. The easiest way of ensuring that one operation is
only available after another operation is to have the earlier operation return
some kind of token, which needs to be passed to the later operation. So calling
lock
would somehow generate a token that the caller could exchange for the
ability to call unlock
, at most once. In Rust, we can do that by creating a
type that can’t be copied or cloned, something like:
// Note that this is not Copy or Clone, so it can't be duplicated.
This can work, though it only solves part of the problem – because the code
that called lock
can still, deliberately or accidentally, hang on to that
&mut T
after turning in their UnlockToken
.
It also creates a new problem: what if we hand an UnlockToken
generated by one
mutex to another mutex? That would let us unlock a mutex at an unexpected
time, and we’re back to having data races. We could include information inside
the UnlockToken
indicating which mutex it came from – maybe a pointer? – and
then panic if the user confuses their tokens. That would prevent data races, but
it moves the error to runtime (a panic) which is … unfortunate.
Once we have a pointer to the mutex inside the UnlockToken
, we could remove
the chance of runtime errors by moving the unlock
operation. If we put the
unlock
operation on the token, we have:
// Note that this is not Copy or Clone, so it can't be duplicated.
Note that UnlockToken::unlock
takes self
by-value, meaning it will consume
self
– this satisfies the requirement that you can only unlock once per
token. Because the identity of the mutex being unlocked is now implied by the
token, it’s impossible to try to use one mutex’s token to unlock another. That
satisfies the other requirement.
We’ve developed a new issue though: now that unlock
can only be called on an
UnlockToken
, what happens if the user just drops the token? The naive
implementation would leave the mutex locked forever. This doesn’t violate
safety-in-the-Rust sense by producing data races etc., but it would create
bugs. We probably want to implement Drop
for UnlockToken
so that it can
detect this case. There are two obvious ways to do this:
- Write a
Drop
impl that panics. - Write a
Drop
impl that unlocks the mutex.
The Drop
impl that panics creates a new possible runtime error. This raises
the question of whether accidentally dropping the token is likely to indicate a
bug. If it’s a bug, panicking is reasonable to protect the program from the
bug’s effects. If it’s not, panicking is just installing a trap for the user to
run into.
With the current API sketch, what would accidentally dropping the token look like? The most compact way of doing it is this:
let = mutex.lock;
guarded_data.do_stuff;
Assigning the token to the _
wildcard pattern causes it to be dropped
immediately, so the access to guarded data on the second line occurs with the
mutex unlocked. Panicking if the token is dropped would prevent the access (and
the race condition) from happening… in this case.
But not in this case:
let = mutex.lock;
token.unlock;
guarded_data.do_stuff; // uh oh still in scope
This doesn’t panic, and does produce a data race.
The point I’m trying to make here is that I think the question of whether to
panic when the token is dropped is a distraction – either solution can work
(though I personally dislike introducing unnecessary panics and would opt for
the unlock-on-drop option). But neither solution is sufficient to make
unlock
safe!
To fix this, we need to ensure that the lifetime of the unlock token, and the lifetime of the reference to guarded data, match exactly – that the reference cannot outlive the unlock token. The simplest way to do this is to stop treating them like separate values, and merge them together. Something like…
// Note that this is not Copy or Clone, so it can't be duplicated.
// Deref allows access to the guarded data while the MutexGuard lives.
// ... you'll also want DerefMut, omitted here for brevity.
At this point, we have recreated the std::sync::Mutex
API. This neatly fixes
all of the problems we’ve hit in this section:
- It is not possible to unlock the mutex without locking it first, since you
need to be holding a
MutexGuard
to unlock. - Locking the mutex gives you the right to unlock it only once, because the
MutexGuard
cannot be duplicated. - As soon as the mutex is unlocked, it becomes impossible to access guarded
data, preventing data races – because unlocking the mutex requires the
MutexGuard
to go out of scope, and theMutexGuard
was how we were accessing guarded data.
As far as explicit calls to unlock
vs. relying on Drop
– either solution
can work if you are very careful about how you write unlock
. For instance,
there is an unlock
operation proposed for addition to the standard library. It
looks like this:
Yup, that’s an empty function. It just moves the MutexGuard
into the function,
by value, and then drops it. (This is the same way std::mem::drop
is
implemented, if you’re curious.) The reason this is safe is that it still relies
on the MutexGuard
to manage access to guarded data, and the mutex being
unlocked is still implicit in the MutexGuard
. Notice that the function has no
&self
parameter specifying a Mutex
; this means it’s called like this:
let guard = mutex.lock;
guard.do_stuff;
;
unlock
As I hope this section has explained, any explicit unlock operation in safe
Rust needs to look essentially like this. (And is probably a synonym for
drop
.)
Personally, I prefer this pattern for making the scope of mutex access explicit where required:
// guard is no longer accessible outside the scope.
Conclusions
The short version is: you can certainly create a C-style mutex API in Rust, but
it gives up most of Rust’s safety guarantees, because it can be used to
trivially create data race bugs and/or aliasing exclusive references, and so the
API needs to be almost entirely unsafe
. And then used very, very carefully.
Presumably with a lot of comments.
However: Comments are not a concurrency strategy.
Relying on the programmer to always read, comprehend, and remember the documentation – and then do everything right, every time – is how we get bugs.
One of the indicators I use when doing a security audit of code is looking for large documentation blocks or coding standards with detailed documentation patterns, like the one I highlighted in Chromium’s guide. They’re almost always an indicator that a nearby API is deeply flawed and will be used to make mistakes.
Now that we understand why the Rust API is structured as it is, it’s worth asking – why is the C mutex API structured in a way that is hard to use and trivial to misuse, requiring elaborate comments or even static analysis to get right? This, despite the standard API being designed circa 2010, well into the era of commodity multicore processors.
The question is simultaneously fair and unfair. There are important language
features missing from C (and C++) that make it impossible to implement a
Rust-style mutex API with the same guarantees – lack of explicit lifetimes,
absence of an equivalent to Sync
, lack of well-defined “move semantics” for
ensuring that values end their lives at controlled moments (like with
MutexGuard
). So, it’s unreasonable to expect the C standard to define a safe
mutex API.
But it is not unreasonable to use better tools.