Why Rust mutexes look like they do

Comments are not a concurrency strategy.

2022-03-31

One of the common complaints I hear from systems programmers who try Rust is about mutexes, and specifically about the Rust Mutex API. The complaints usually go something like this:

They don’t want the mutex to contain data, just a lock.
They don’t want to have to manage a “guard” value that unlocks the mutex on drop – often, more specifically, they just want to call an unlock operation because they feel like that’s more explicit.

These changes would make the Rust mutex API equivalent to the C/Posix mutex API. In one case I’ve seen someone try to use Mutex<()> and trickery to fake it.

There’s a problem with this, though: these two aspects of Mutex’s design are inextricably linked to one another, and to Rust’s broader safety guarantees – changing either or both of them will open the door to subtle bugs and corruption due to data races.

A C-style mutex API consisting of some bundle of implicitly guarded data, plus lock and unlock functions, isn’t wise in Rust because it allows safe code to easily commit errors that break memory safety and create data races.

Perhaps controversially, I’d argue that this is also true in C. It’s just more obvious in Rust, because Rust rigorously distinguishes between the notion of “safe” code that cannot commit such errors, and “unsafe” code that can commit such errors if it wishes. C does not make this distinction, and as a result, any code using a mutex in C can trivially produce serious, potentially exploitable, bugs.

In the rest of this post I’ll walk through a typical C mutex API, compare with a typical Rust mutex API, and look at what happens if we change the Rust API to resemble C in various ways.

Mutexes in C

(Note: as usual, when I say “C,” my comments should be taken to also apply to C variants such as C++, which use essentially the same mutex design.)

There are a wide variety of mutex APIs in C, largely because the language didn’t specify a standard one until 2011. I’ll use the C11 standard mutex for this post because it’s simple and universally available, but this description applies just as well to (for example) pthreads.

The C mutex API, for the purposes of this post, consists of two primary operations: lock and unlock¹.

// Locks a mutex, blocking if necessary until it becomes free.
int mtx_lock(mtx_t *mutex);

// Unlocks a mutex.
int mtx_unlock(mtx_t *mutex);

For this purpose I’m ignoring: mutex creation and destruction, things like trylock, mutex attributes, the distinction between recursive and nonrecursive mutexes, etc. None of these have any bearing on the point I’m making.

These functions follow the normal C convention of returning int, with 0 indicating success and anything else indicating failure.

When code wants to safely access data that might be shared across threads, it calls mtx_lock. It then accesses the data, before finally calling mtx_unlock. Here is a simple example, where these operations are used to maintain a global counter that can be incremented from multiple threads:

mtx_t *the_mutex;
int the_counter;

// Code to initialize the_mutex omitted.

int increment_the_counter() {
    int r = mtx_lock(the_mutex);
    if (r != 0) return r;

    // With the mutex locked we can safely poke the counter.
    the_counter++;

    return mtx_unlock(the_mutex);
}

// Note that this function reads into an "out-parameter", *value_out,
// because our return value is used to indicate success/failure.
int read_the_counter(int *value_out) {
    int r = mtx_lock(the_mutex);
    if (r != 0) return r;

    // With the mutex locked we can safely read the counter.
    *value_out = the_counter;

    return mtx_unlock(the_mutex);
}

In fancier cases, a system might use more granular mutexes that are stored in data structures alongside the data they protect, as in:

struct SomeData {
    mtx_t *mutex;
    int the_data;
};

To determine which data is intended to be protected by which mutex, C programmers typically use documentation conventions, like this one from the Chromium mutex docs:

Every shared variable/field should have a comment indicating which mutex protects it:
int accesses_; // count of accesses (guarded by mu_)
or a comment explaining why no mutex is needed:
int table_size_; // no. of elements in table (readonly after init)
Every mutex should have a comment indicating which variables and also any non-obvious invariants it protects:
Lock mu_; // protects accesses_, list_, count_
          // invariant: count_ == number of elements in linked-list list_
Think of the matching comments on variables and mutexes as analogous to matching types on procedure arguments and parameters; the redundancy can be very helpful to later maintainers of the code.

While I’m trying to present this section without judgment, I can’t quite skip past that last paragraph with my mouth closed. There’s an important distinction between procedure argument types and mutex scope comments, which is that procedure argument types are checked by the compiler. This is closer to declaring all your parameters void *, stating their real types in the comments, and expecting your users to always get the casts and order right.

But I digress. Let’s go look at Rust for comparison.

Mutexes in Rust

Rust provides a Mutex type in the standard library’s std::sync module. The API differs from C in three ways:

Mutex contains the data it guards: the full name of the type is Mutex<T>, for some guarded type T that you choose.
The lock operation returns a “guard” value.
The unlock operation is only available on the guard value, not on Mutex itself. (The unlock operation also happens to be drop, which we’ll consider in more detail later.)

In the bare-metal no_std environments where I typically work, we have our own Mutex types, but they look pretty much the same as the standard library Mutex – for good reasons, which should become apparent over the course of this post.

Concretely, a simplified version of the Rust API looks like this:

// A mutex guarding some data of type T.
struct Mutex<T> {
    // contents omitted
}

impl<T> Mutex<T> {
    // Creates a mutex, initializing its guarded data.
    pub fn new(contents: T) -> Self { ... }

    // Locks the mutex, blocking if necessary. Once the lock
    // is obtained, returns a MutexGuard containing a reference
    // to the guarded data.
    pub fn lock(&self) -> MutexGuard<'_, T> { ... }
}

// The result of locking a mutex with lifetime 'a, guarding
// data of type T. Note that this does not implement Copy or
// Clone, so it cannot be duplicated.
struct MutexGuard<'a, T> {
    // contents omitted
}

(I am ignoring a concept called “lock poisoning” for this simplified API, because it’s not relevant to my point.)

And an example of our counter increment API rewritten using a Rust mutex:

// The Rust version doesn't use global variables because doing so would
// distract from my point by requiring some unsafe.

struct Counter {
    mutex: Mutex<usize>,
}

fn increment_the_counter(ctr: &Counter) {
    let guard = ctr.mutex.lock();
    *guard += 1;

    // You could also express this as
    // *ctr.mutex.lock() += 1;
    // if you wanted to be more terse.
}

fn read_the_counter(ctr: &Counter) -> usize {
    let guard = ctr.mutex.lock();
    *guard
}

The way this kind of API is usually described, the MutexGuard type is a smart pointer that allows access to the mutex contents of type T, but only while the guard itself exists. When it is dropped explicitly, or goes out of scope, access ends and the mutex unlocks.

But another way of looking at it is: a MutexGuard is a token that proves that the mutex has been locked.

Because you cannot² create a MutexGuard except by the Mutex::lock operation, holding a MutexGuard demonstrates that lock has been called.
Because the Mutex, by definition, will not hand out more than one MutexGuard – a second call to lock while a MutexGuard exists will block until the first one is destroyed, and MutexGuard itself cannot be duplicated – holding a MutexGuard demonstrates unique access to the data guarded by the mutex. Which, in Rust terms, means that you can get a &mut T out of it.
Because the lifetime parameter 'a in the definition of MutexGuard gets tied to the lifetime of the Mutex itself when you call lock, the compiler won’t let you drop the Mutex while still holding the MutexGuard, keeping it from turning into a dangling pointer.

By which I mean, you cannot do so at all in safe Rust, and you can’t easily do so accidentally in unsafe Rust. You can, of course, go out of your way to break any language invariant in unsafe Rust. I am attempting to make software that is robust against mistakes by well-intentioned programmers. If you expect to have evil tricky programmers working in your codebase, you’ll want to disable unsafe Rust using the #![forbid(unsafe_code)] attribute. And then possibly review your hiring practices.

Variations of the Rust mutex API and their problems

As I summarized at the top of this post, there are two main objections I hear to the Rust mutex API.

I don’t want the data guarded by the mutex to live inside the mutex.
I don’t want to use a guard value to track the mutex being locked (usually with the implication that an unlock function should be available).

Let’s try these variations!

Moving guarded data outside the mutex

You can try this one today! Just put nothing inside the mutex, and instead store it alongside some guarded data, like we would in C:

struct SomeData {
    mutex: Mutex<()>,
    the_data: i32,
}

(If you’ve spent enough time in Rust to be familiar with the term “interior mutability” you may see a problem with this definition – shhh, no spoilers.)

When we do this, we immediately give up one thing: any mutex help from the compiler. We can now freely poke the_data without locking the mutex. Presumably at that point we’d add comments like Chromium’s, explaining how to use the SomeData struct correctly.

But that means that anyone using this API who fails to read the comment (or misunderstands it) will be able to introduce data races, just like in C – right?

Surprisingly, the answer is: no, this struct still can’t be used to produce data races in safe Rust, even if you write code like this:

fn use_the_data_wrong(data: &SomeData) -> i32 {
    // I am a bad person and I will access the guarded data without locking
    data.the_data
}

That sure looks like a textbook data race, and the code compiles without issue – but it turns out this isn’t a data race, because we’ve lost something with this change: the ability to update SomeData from multiple threads.

We are able to share a SomeData across threads, because SomeData automatically implements Sync. Sync is the standard Rust trait that indicates that something can be safely shared across threads – its name implies that the shared-thing does some sort of sync-hronization. Sync is automatically inferred for types that meet some basic criteria, one of which is that their contents must all also be Sync, which in this case is true.

But sharing across threads doesn’t mean mutating, and if the_data is effectively constant, there’s no longer a data race implied by reading it without locking³.

Assuming, for the purposes of this post, that you’ve done appropriate barriers if you’re on a weak-memory-model multicore processor. Chances are, you only have to worry about that if you know what it means.

Now, an i32 is a simple machine type that happens to have an atomic counterpart, AtomicI32 (from std::sync::atomic). That type is Sync and provides atomic operations for updating it from many threads – though at that point, you probably don’t need the mutex! In other words, AtomicI32 provides interior mutability – its API allows its contents to change even if you only have a shared reference to it.

If the guarded data is more complex than an integer – say, it’s a collection of integers and pointers, and you want to keep them internally consistent – then it won’t have an atomic counterpart, so if we want to be able to mutate it through a shared reference, we need to put it inside some sort of container that manages access rigorously enough to be Sync even though its contents are mutable.

Like, say, std::sync::Mutex.

But this section is about not putting shared data inside thread-safe containers like Mutex, so, let’s keep thinking.

There is, in fact, another way.

You could carefully encapsulate SomeData in a module, keeping its fields private so that code outside the module can’t reference the_data directly. You could then provide functions for operating on SomeData that are careful to manage the mutex correctly. In fact, to make the point, you could stop (ab)using Mutex<()> and switch to AtomicBool.

// Assume we are in a module separate from any client code.

use std::sync::{AtomicBool, Ordering};

// Struct is pub; fields are not.
pub struct SomeData {
    locked: AtomicBool,
    the_data: i32,
}

pub fn try_read_data(d: &SomeData) -> Option<i32> {
    if d.locked.swap(true, Ordering::Acquire) {
        // we were already locked!
        None
    } else {
        // we've locked the "mutex" and can read safely.
        let result = d.the_data;
        d.locked.store(false, Ordering::Release);
        Some(result)
    }
}

But we still haven’t gained the ability to update the_data with a shared reference, &SomeData, which is all we’ll have once it’s shared across threads.

This is because Mutex actually plays two roles on behalf of its guarded data: it provides synchronization, yes, but it also provides interior mutability, giving the ability to write the data through a shared reference. In other words, it is both a lock and a container like Cell or RefCell. However, both Cell and RefCell aren’t Sync (because they lack the thread-safe locking part of Mutex), so you can’t use one of those types to wrap the_data or we’ll lose the ability to share it across threads at all.

Instead, you have to drop down a level and use the type that Cell, RefCell, and Mutex all use under the hood: UnsafeCell. As its name implies, we’re about to grow more unsafe code.

// Assume we are in a module separate from any client code.

use std::cell::UnsafeCell;
use std::sync::{AtomicBool, Ordering};

// Struct is pub; fields are not.
pub struct SomeData {
    locked: AtomicBool,
    the_data: UnsafeCell<i32>,
}

pub fn try_read_data(d: &SomeData) -> Option<i32> {
    if d.locked.swap(true, Ordering::Acquire) {
        // we were already locked!
        None
    } else {
        // we've locked the "mutex"!
        // Safety: this is unsafe because it's dereferencing the
        // raw pointer returned by UnsafeCell::get. For that to be
        // safe, we have to know the pointer is valid -- which
        // UnsafeCell ensures for us -- and that the reference we
        // produce is not aliasing or racing, which we're, uh,
        // pretty sure about? because of the lock? probably?
        let result = unsafe {
            *d.the_data.get()
        };
        d.locked.store(false, Ordering::Release);
        Some(result)
    }
}

pub fn try_write_data(d: &SomeData, new_val: i32) -> bool {
    if d.locked.swap(true, Ordering::Acquire) {
        // we were already locked!
        false
    } else {
        // we've locked the "mutex"!
        // Safety: see the justification in try_read_data.
        unsafe {
            *d.the_data.get() = new_val;
        }
        d.locked.store(false, Ordering::Release);
        true
    }
}

// Declare to the compiler that we're sure this can now be
// shared across threads.
unsafe impl Sync for SomeData {}

We’ve had to add an unsafe impl of Sync. This asserts to the compiler that we meet the criteria to be treated as Sync…without checks. This is the only way to implement Sync manually, because all the checked ways of implementing Sync happen automatically.

With that change, we can now update our shared data across threads. We’re getting closer to what we wanted.

However, we’ve also reimplemented most of Mutex… poorly. What we’ve got here is an equivalent to Mutex that

Only supports one kind of guarded data – so if you need a second one you’ll be writing all this again.
Can’t give you a reference to guarded data, so all updates have to be implemented in this module, and done by-copy.
Doesn’t support blocking, because blocking mutexes typically require OS support, and we’ve chosen to write our own instead.

Let’s try and fix the top two points there by adding a try_lock operation that produces a reference, and making the type generic:

struct SharedData<T> {
    owned: AtomicBool,
    contents: UnsafeCell<T>,
}

impl<T> SharedData<T> {
    pub fn try_lock(&self) -> Option<&mut T> {
        if self.owned.swap(true, Ordering::Acquire) {
            // was already locked
            None
        } else {
            // we win!
            // safety: this &mut is unique because we won any potential
            // races to set the owned flag via atomic swap.
            unsafe { &mut *self.contents.get() }
        }
    }
}

unsafe impl<T: Send> Sync for SharedData<T> {}

This is looking more like the standard Mutex type, only with fewer features. In particular, as written, there’s no way to unlock.

A Mutex-like thing that cannot be unlocked can still be useful – it’s the basis for what I call the First-Mover Allocator Pattern, which uses almost exactly the code above. However, it’s not much of a Mutex. At this point you’ve got two options. You can implement unlock using a guard type, at which point you really have recreated std::sync::Mutex, or you can fall into the trap described in the next section.

Unlock is unsafe.

What if we removed MutexGuard from the standard Mutex and instead provided an unlock operation, as in C?

Here’s a sketch of how that might look, if we leave the guarded data inside the mutex (and thus avoid the issues described in the previous section):

struct Mutex2<T> {
    // contents omitted
}

impl<T> Mutex2<T> {
    pub fn lock(&self) -> &mut T { ... }

    pub fn unlock(&self) { ... }
}

Instead of returning some fancy resource-managing MutexGuard type, this just returns an exclusive reference to the guarded data, &mut T.

In this API, as in C, it’s legal to call unlock any time you have access to the Mutex2. This, in turn, means that there is no way to ensure that you only use the reference to the guarded data before you unlock:

let guarded_data = mutex.lock();
guarded_data.do_stuff();
mutex.unlock();

guarded_data.do_stuff();  // uh oh still in scope

It also means we’ve built a tool for manufacturing &mut references that alias, which is another way of violating memory safety:

let guarded_data = mutex.lock();
mutex.unlock();
let guarded_data2 = mutex.lock();

*guarded_data = *guarded_data2;  // uh oh they alias

Basically, uncontrolled unlock loses the ability to reason about whether any references to guarded data remain available, and gives safe code the ability to provoke arbitrary data races. That’s exactly what the mutex was trying to prevent.

You can provide a C-style unlock operation on a mutex in Rust, but it needs to be unsafe – because the caller needs to ensure things the compiler can’t, like calls to unlock pairing one-to-one with calls to lock, and references to guarded data not escaping beyond the unlock.

However, for the Mutex2 type I sketched above, that basically means the mutex is useless for safe code – most code using a mutex probably wants to be able to unlock it! We’ve run back into the issue from the previous section.

To fix this, we need to make unlock safe, and for it to be safe, we need to have some way of preventing access to unlock except for exactly one unlock call after the mutex has been locked, and after any references to guarded data have been disposed of. The easiest way of ensuring that one operation is only available after another operation is to have the earlier operation return some kind of token, which needs to be passed to the later operation. So calling lock would somehow generate a token that the caller could exchange for the ability to call unlock, at most once. In Rust, we can do that by creating a type that can’t be copied or cloned, something like:

struct Mutex2<T> { ... }

impl<T> Mutex2<T> {
    // Produces a token _and_ a reference to guarded data now.
    pub fn lock(&self) -> (UnlockToken, &mut T) { ... }

    pub fn unlock(&self, token: UnlockToken) { ... }
}

// Note that this is not Copy or Clone, so it can't be duplicated.
struct UnlockToken { ... }

This can work, though it only solves part of the problem – because the code that called lock can still, deliberately or accidentally, hang on to that &mut T after turning in their UnlockToken.

It also creates a new problem: what if we hand an UnlockToken generated by one mutex to another mutex? That would let us unlock a mutex at an unexpected time, and we’re back to having data races. We could include information inside the UnlockToken indicating which mutex it came from – maybe a pointer? – and then panic if the user confuses their tokens. That would prevent data races, but it moves the error to runtime (a panic) which is … unfortunate.

Once we have a pointer to the mutex inside the UnlockToken, we could remove the chance of runtime errors by moving the unlock operation. If we put the unlock operation on the token, we have:

struct Mutex2<T> { ... }

impl<T> Mutex2<T> {
    // Produces a token _and_ a reference to guarded data now.
    pub fn lock(&self) -> (UnlockToken, &mut T) { ... }
}

// Note that this is not Copy or Clone, so it can't be duplicated.
struct UnlockToken { ... }

impl UnlockToken {
    // This is _not_ sound, see below.
    pub fn unlock(self) { ... }
}

Note that UnlockToken::unlock takes self by-value, meaning it will consume self – this satisfies the requirement that you can only unlock once per token. Because the identity of the mutex being unlocked is now implied by the token, it’s impossible to try to use one mutex’s token to unlock another. That satisfies the other requirement.

We’ve developed a new issue though: now that unlock can only be called on an UnlockToken, what happens if the user just drops the token? The naive implementation would leave the mutex locked forever. This doesn’t violate safety-in-the-Rust sense by producing data races etc., but it would create bugs. We probably want to implement Drop for UnlockToken so that it can detect this case. There are two obvious ways to do this:

Write a Drop impl that panics.
Write a Drop impl that unlocks the mutex.

The Drop impl that panics creates a new possible runtime error. This raises the question of whether accidentally dropping the token is likely to indicate a bug. If it’s a bug, panicking is reasonable to protect the program from the bug’s effects. If it’s not, panicking is just installing a trap for the user to run into.

With the current API sketch, what would accidentally dropping the token look like? The most compact way of doing it is this:

let (_, guarded_data) = mutex.lock();
guarded_data.do_stuff();

Assigning the token to the _ wildcard pattern causes it to be dropped immediately, so the access to guarded data on the second line occurs with the mutex unlocked. Panicking if the token is dropped would prevent the access (and the race condition) from happening… in this case.

But not in this case:

let (token, guarded_data) = mutex.lock();
token.unlock();
guarded_data.do_stuff(); // uh oh still in scope

This doesn’t panic, and does produce a data race.

The point I’m trying to make here is that I think the question of whether to panic when the token is dropped is a distraction – either solution can work (though I personally dislike introducing unnecessary panics and would opt for the unlock-on-drop option). But neither solution is sufficient to make unlock safe!

To fix this, we need to ensure that the lifetime of the unlock token, and the lifetime of the reference to guarded data, match exactly – that the reference cannot outlive the unlock token. The simplest way to do this is to stop treating them like separate values, and merge them together. Something like…

struct Mutex2<T> { ... }

impl<T> Mutex2<T> {
    // Produces a combined token-and-reference.
    pub fn lock(&self) -> MutexGuard<'_, T> { ... }
}

// Note that this is not Copy or Clone, so it can't be duplicated.
struct MutexGuard<'a, T> { ... }

impl<T> Drop for MutexGuard<'a, T> {
    // ... code to unlock mutex goes here
}

// Deref allows access to the guarded data while the MutexGuard lives.
impl<T> Deref for MutexGuard<'a, T> {
    type Target = T;
    fn deref(&self) -> &Self::Target { ... }
}
// ... you'll also want DerefMut, omitted here for brevity.

At this point, we have recreated the std::sync::Mutex API. This neatly fixes all of the problems we’ve hit in this section:

It is not possible to unlock the mutex without locking it first, since you need to be holding a MutexGuard to unlock.
Locking the mutex gives you the right to unlock it only once, because the MutexGuard cannot be duplicated.
As soon as the mutex is unlocked, it becomes impossible to access guarded data, preventing data races – because unlocking the mutex requires the MutexGuard to go out of scope, and the MutexGuard was how we were accessing guarded data.

As far as explicit calls to unlock vs. relying on Drop – either solution can work if you are very careful about how you write unlock. For instance, there is an unlock operation proposed for addition to the standard library. It looks like this:

impl Mutex<T> {
    pub fn unlock(guard: MutexGuard<'_, T>) {
        // I have not omitted the code here --
        // the function is actually empty!
    }
}

Yup, that’s an empty function. It just moves the MutexGuard into the function, by value, and then drops it. (This is the same way std::mem::drop is implemented, if you’re curious.) The reason this is safe is that it still relies on the MutexGuard to manage access to guarded data, and the mutex being unlocked is still implicit in the MutexGuard. Notice that the function has no &self parameter specifying a Mutex; this means it’s called like this:

let guard = mutex.lock();
guard.do_stuff();
Mutex::unlock(guard);

As I hope this section has explained, any explicit unlock operation in safe Rust needs to look essentially like this. (And is probably a synonym for drop.)

Personally, I prefer this pattern for making the scope of mutex access explicit where required:

{
    let guard = mutex.lock();
    guard.do_stuff();
}
// guard is no longer accessible outside the scope.

Conclusions

The short version is: you can certainly create a C-style mutex API in Rust, but it gives up most of Rust’s safety guarantees, because it can be used to trivially create data race bugs and/or aliasing exclusive references, and so the API needs to be almost entirely unsafe. And then used very, very carefully. Presumably with a lot of comments.

However: Comments are not a concurrency strategy.

Relying on the programmer to always read, comprehend, and remember the documentation – and then do everything right, every time – is how we get bugs.

One of the indicators I use when doing a security audit of code is looking for large documentation blocks or coding standards with detailed documentation patterns, like the one I highlighted in Chromium’s guide. They’re almost always an indicator that a nearby API is deeply flawed and will be used to make mistakes.

Now that we understand why the Rust API is structured as it is, it’s worth asking – why is the C mutex API structured in a way that is hard to use and trivial to misuse, requiring elaborate comments or even static analysis to get right? This, despite the standard API being designed circa 2010, well into the era of commodity multicore processors.

The question is simultaneously fair and unfair. There are important language features missing from C (and C++) that make it impossible to implement a Rust-style mutex API with the same guarantees – lack of explicit lifetimes, absence of an equivalent to Sync, lack of well-defined “move semantics” for ensuring that values end their lives at controlled moments (like with MutexGuard). So, it’s unreasonable to expect the C standard to define a safe mutex API.

But it is not unreasonable to use better tools.

#api-design #concurrency #rust

Cliffle