Safely writing code that isn't thread-safe

An under-appreciated Rust feature

2022-11-22

One of the nice things about the Rust programming language is that it makes it easier to write correct concurrent (e.g. threaded) programs – to the degree that Rust’s slogan has been, at times, “fearless concurrency.”

But I’d like to tell you about the other side of Rust, which I think is under-appreciated. Rust enables you to write programs that are not concurrent. This feature is missing from most other languages, and is a source of much complexity and bugs.

“But wait,” you might be saying, “of course I can write code that isn’t concurrent in Java or Python or C!”

Can you, though? You can certainly write code that ignores concurrency, and would malfunction if (say) used from multiple threads simultaneously. But that’s not the same thing as writing code that isn’t concurrent – code that simply can’t be used concurrently, by compiler guarantee.

In Rust, you can. Let’s look at why you can do it, and why it’s awesome.

The bank account example

Here’s a very common example of a data structure that trips people up in the presence of threads. It’s a model of a bank account.

I’ve written this in C for reasons that will become apparent later. Please never use a float variable to store an amount of money, this is just an example.

/*
 * Stores information about a bank account.
 */
struct bank_account {
    /* How much money is in the account. */
    float balance;
};

/*
 * Deposits 'amt' into the 'account' and returns the new balance.
 */
float deposit(struct bank_account *account, float amt) {
    account->balance += amt;
    return account->balance;
}

/*
 * Tries to withdraw 'amt' from the account. Returns 'true' if it
 * succeeded, 'false' if the account didn't contain enough.
 */
bool withdraw(struct bank_account *account, float amt) {
    if (account->balance >= amt) {
        account->balance -= amt;
        return true;
    } else {
        return false;
    }
}

This is just about the simplest way you could write this in C – it would look very similar in Java. Those of you with thread-related scars have probably already noticed the issue: in the presence of concurrency, the bank account update operations deposit and withdraw are not necessarily atomic.

This means that, if two threads are accessing the same account, you could end up with the following interleaving of operations:

Thread A: begins depositing 100.
Thread B: begins withdrawing 100.
Thread A: reads current account balance to increment it, let’s say it started out at 100.
Thread B: also reads current account balance, and decrements it. Why did Thread B get to do a larger chunk of work here? Who can say? Maybe the computer is mad at you.
Thread A: stores the balance after the deposit: 200.

Thread B thinks it successfully withdrew 100 moneys, while Thread A thinks it successfully deposited 100 moneys, which should leave the account balance unchanged… but it actually increased by 100 moneys. This is certainly not what the bank wants – and it’s also not what we, the programmers, intended!

When threads attack!

The bank account code was not written to be used concurrently. If it had been, it would have been written differently. But nothing prevents someone from using that code in a larger program with threads. This means that the code can be the victim of nonconsensual concurrency.

Nonconsensual concurrency is when your code has concurrency forced upon it. Nevermind that the code was not written with threads or other concurrency in mind – you’re getting threads whether you like it or not!

This sounds unpleasant, but unfortunately it’s very common. In my experience, it’s the main way programmers learn about data races and other concurrency-related bugs in the first place: the hard way. By noticing (or, worse, having someone else notice!) that their code is misbehaving. The past several decades have been a story of one layer after another being used from threads without being designed for it, getting bitten by the bugs, making the repairs, and repeating – starting from the fundamental libc and friends and working up.

The awful thing about nonconsensual concurrency – other than the part where it removes your agency as programmer, of course – is that it tends to be fixed in a hurry, after the fact. If you’ve got an existing chunk of code that worked fine before those darned threads showed up, you’re not likely to sit down and redesign all your data structures and algorithms from scratch to use fancy lock-free high-concurrency alternatives. No, if you’re anything like most programmers (myself included) you’re more likely to just slap some locks on things and hope it starts working again.

And in a lot of ways, this is a reasonable response! We’ve all got deadlines and other stuff to do, and it’s not like we invited that bozo over there to start using threads in our code. We know “slap a big lock on it” will work because it’s been done a gazillion times.

Right?

When locks attack!

As attractive as “hit it with a lock” may be, there are some issues.

Scope. It’s very important to make sure that all fields being guarded by a lock are only accessed with the lock held. If any accesses happen without coordinating through the lock, then you haven’t fixed the bug, only made it harder to find¹. This mistake is very easy to make when retrofitting locks onto library code, like when defending against nonconsensual concurrency.
Granularity. Locking and unlocking are not free. For best performance using locks, it’s important that the code do as much work as possible with the lock held, rather than repeatedly locking and unlocking it. This also means that nesting components that each have their own lock won’t be great either, since you’ll have to lock a series of locks on the way in, and then unlock them all on the way out.
Liveness. While a lock is locked, no other thread can get to the chunks of code/data it protects – that’s what it’s for. However, this can also bottleneck a program. Imagine that we protect all accounts at the bank with a single lock. Now, no matter how many CPUs you put in your fancy bank computer, you’re only ever going to be processing one account operation at a time, leaving most of those CPUs idle! (Does it sound like this is in direct conflict with the Granularity point above? That’s because it is!)
Built-in overhead. Because locks are not free, your code is going to take some performance hit for using them. If your users are not using your code from multiple threads, this is simply a waste.

In Rust, locks (like Mutex or, in some cases, RefCell) contain the data they protect, which ensures that all accesses happen through the lock. This is important enough that I wrote a separate article on it.. I’ll return to the topic below.

Those points hold across basically all mainstream programming languages, but there’s another one that is more subtle: assuming a concurrency scheme.

Mutexes (a common option for the type of lock used here) tend to require operating system support. Using an OS-specific version like pthread_mutex will prevent your code from running on other operating systems. And using any flavor will restrict your code to running on an operating system that provides mutexes – meaning that your code likely can’t be used in embedded contexts or inside the implementation of an OS kernel. (For our bank account example, that may be less of a concern, unless you’re writing some kind of BankOS I guess.)
It is entirely possible to have concurrency without threads. You might have noticed that I’m using both of those terms in this article; threads are the most common form of concurrency that programmers encounter these days, but there are other options. Basically all languages except C(++) have an async mode that transforms code into state machines that can be interleaved on a single thread of execution. In async mode, you generally need to be using a different kind of lock to keep from blocking execution of other async routines on the same thread. If a lock is unavailable, you want to switch to a different async routine, not stop and wait!
Finally, while rare nowadays, it’s possible that your user doesn’t have threads, or the equipment that comes along with threads, like mutexes. This is most common in bare-metal embedded contexts, but is also true when developing on Hubris, where each task is isolated and deliberately single-threaded. If you use a mutex in your code, you’ve prevented your code from being used in these contexts.

Head-in-the-sand concurrency

There’s another option, of course. If we’re concerned about the performance cost of managing locks or using atomic data types, we could just not.

This approach is quite common, particularly (in my experience) in C code. The C standard library has a bunch of examples of this pattern for historical reasons. Some of these functions are not only unsafe to use from threads, they may not even work if you call them at two levels in a nested for loop, which sure stretches the definition of “concurrency!” (strtok is a good example of this.)

On the one hand, we might be tempted to scoff at such API as a relic of our pre-thread past. But I think that’s being too quick to judge – there are often very good reasons for writing API that isn’t thread safe, starting with simplicity. Our bank account example is so simple that a reader can easily convince themselves that it’s correct…in the absence of concurrency.

No, I would argue that the problem with non-reentrant code in C is not that it’s not reentrant – it’s that there is no way to prevent it from being misused, in a recursive function or concurrent program.

No way other than this, of course:

/*
 * WARNING: THIS API IS NOT THREAD SAFE
 *
 * Do not call it in a threaded context.
 *
 * Please add this function to the mental list of hundreds
 * of things that you have to manually look for in code
 * review to prevent your software from shipping with bugs.
 *
 * If you forget about this, probably nothing will happen
 * right away, but in about six months you will suddenly be
 * swallowed by a sinkhole that opens in the middle of the
 * street, or something to that effect.
 *
 * Memento mori.
 */

Because of course everyone will read this comment and remember it!

As I put it in my article on the Rust mutex type: comments are not a concurrency strategy.

If only there was a way to keep our code simple, but make it hard to misuse…

The bank account example, in Rust.

Let’s revisit that bank account example, but this time, let’s do it in Rust:

// Remember, this is merely an example, don't use f32 for currency.

/// Stores information about a bank account.
struct BankAccount {
    /// How much money is in the account.
    balance: f32,
}

/// Deposits 'amt' into the 'account' and returns the new balance.
fn deposit(account: &mut BankAccount, amt: f32) -> f32 {
    account.balance += amt;
    account.balance
}

/// Tries to withdraw 'amt' from the account. Returns 'true' if it
/// succeeded, 'false' if the account didn't contain enough.
fn withdraw(account: &mut BankAccount, amt: f32) -> bool {
    if account.balance >= amt {
        account.balance -= amt;
        true
    } else {
        false
    }
}

This is the exact same code with the syntax changed to appease a different compiler. It sure looks like it has the same bugs as the C version, doesn’t it? For instance, withdraw is checking account.balance and then decrementing it, which seems like it has the potential for a race!

However, note that making any changes to the account (as in deposit or withdraw) requires an exclusive reference (&mut BankAccount). Exclusive references are unique in Rust. This means that a BankAccount can’t be altered by two concurrent routines, because they can’t both hold an exclusive reference.

In other words, when written the simple way, the bank account example in Rust does not have a concurrency bug.

But let’s say that we want account modifications through a shared reference (&BankAccount) to work. This can be useful even without concurrency – maybe we’ve found ourselves in a case where we need to share BankAccounts between different data structures. The easiest way to make this work is with a Cell to enable mutation through a shared reference:

use std::cell::Cell;

struct BankAccount {
    balance: Cell<f32>,
}

fn deposit(account: &BankAccount, amt: f32) -> f32 {
    let new_balance = account.balance.get() + amt;
    account.balance.set(new_balance);
    new_balance
}

fn withdraw(account: &mut BankAccount, amt: f32) -> bool {
    let bal = account.balance.get();
    if bal >= amt {
        account.balance.set(bal - amt);
        true
    } else {
        false
    }
}

Okay, surely this has a concurrency bug, right? In a way, Cell makes it more obvious that these operations are not atomic! Look at deposit, where we get() the old balance, do some work, and then set() it – what if two threads do this simultaneously? Bug, right?

Let’s see if we can demonstrate the problem by spinning up two threads that fight over the account (using scoped threads to do this conveniently):

use std::thread;

fn demonstrate_race_condition() {
    let account = BankAccount { balance: Cell::new(100.) };
    
    // Start two threads, each of which hammers the account
    // with deposit/withdraw activity:
    thread::scope(|s| {
        s.spawn(|_| {
            for _ in 0..1000 {
                deposit(&account, 100.);
                withdraw(&account, 100.);
            }
        });
        s.spawn(|_| {
            for _ in 0..1000 {
                deposit(&account, 100.);
                withdraw(&account, 100.);
            }
        });
    }).unwrap();

    println!("final balance is: {}", account.balance.get());
}

That ought to do it! Now when we run it, we will…

…huh. We can’t actually run it, because the compiler is upset.

error[E0277]: `Cell<f32>` cannot be shared between threads safely
  --> src/lib.rs:33:11
   |
33 |         s.spawn(|_| {
   |           ^^^^^ `Cell<f32>` cannot be shared between
   |                  threads safely
   |
   = help: within `BankAccount`, the trait `Sync` is not implemented
     for `Cell<f32>`
     note: required because it appears within the type `BankAccount`

  --> src/lib.rs:5:8
   |
5  | struct BankAccount {
   |        ^^^^^^^^^^^
   = note: required because of the requirements on the impl of `Send`
     for `&BankAccount`
   = note: required because it appears within the type
     `[closure@src/lib.rs:33:17: 38:10]`

What the compiler is saying is that, while we have successfully altered deposit and withdraw to operate on shared references, we still can’t actually use the BankAccount type from multiple threads (cannot be shared between threads safely). It seems to be related to our use of Cell.

This is important.

The engineering marvel of `Sync` (and `Send`)

This is the under-appreciated Rust feature I’ve been alluding to.

Unlike most programming languages, where any data structure is assumed to be safe to share across threads by default, Rust makes this part of the type system. A value is safe to share across threads only if its type implements the Sync trait.

Sync is a marker trait, meaning that unlike traits like Clone, it doesn’t define any operations you can perform on the type. It’s empty. Its declaration would resemble (simplified):

pub trait Sync {}

It exists to serve as a marker for types that can be shared across threads. Any type that does not carry this marker can’t be shared across threads.

This seems like a small thing, but it has big implications.

Rust assumes that any data structure whose contents are entirely Sync is safe to share across threads, and automatically implements Sync for the type. If anything inside the data structure is not Sync, neither is its container. (There is also a way to opt-out of this behavior which I’ll discuss below.)

The Cell type we used to implement the bank account above is not Sync, so it cannot be shared across threads. The compiler will prevent us from sending a reference to a Cell to a different thread, and will also stop us from stashing one in a place where another thread might have implicit access to it, like a static variable.

So while our code was “correct” in the absence of concurrency in both C and Rust, the code is also correct (no caveats) in Rust, because we have written it in a way that prevents it from being used in situations where race conditions can happen.

The joys of not being thread-safe

By avoiding Sync we can write a whole family of related types that can’t be shared across threads. This does not prevent them from being used in threaded programs! It just prevents the concurrency from leaking into our code. Our code can be read top-to-bottom as though threads don’t exist, which makes it much easier to understand and check for mistakes. We can write simpler and more concise code by using basic (but not thread-safe!) types like u8 and Vec, instead of AtomicU8 and some kind of fancy concurrent vector with lock-striping or something.

In addition to less complexity for the reader to deal with, we also create less complexity for the machine to deal with. Atomic operations are almost always more expensive (in terms of execution time and code space) than their simpler non-atomic equivalents.

If we (or our clients) want to use this code in a concurrent program, that program gets to choose our concurrency strategy, instead of dictating it from our library code. It might use a big mutex on everything, or mutexes on each account, or have threads own a subset of data and talk to each other over messages on channels. It might even use concurrency without threads using async – because our code does not await, we can be confident that each deposit or withdraw operation will execute atomically to completion even in an async environment.

Don’t get me wrong: there’s still a place for concurrent data structures, particularly in highly parallel applications with unpredictable data access patterns. But sometimes it’s nice to be able to write the simple code, and not worry about threads ruining your day.

Explicitly opting out of `Sync`

As I mentioned above, Rust automatically infers Sync for types whose contents are entirely Sync themselves. This is often what you want, but it can also be a bad thing, for two reasons:

It’s implicit. Someone reading the code cannot easily tell if a type is inferred to be Sync. More importantly, they can’t tell if it’s inferred to be Sync because the author intended it, or by accident! Why would it be bad to infer Sync by accident? Because…
A structure may consist of all Sync members, but not be thread-safe!

That second reason is rare, but it does happen. It’s usually because you have several members in (for example) a struct, each of which is individually Sync, but you want to be able to update them all without possibly exposing intermediate states before the update is done.

In both of these cases, you can explicitly opt your trait out of Sync. There’s a proposed syntax for making this really convenient called “negative impls,” but as of right now (2022 November) it is still considered unstable. So instead, we can use a trick:

We sneak a non-Sync type into our data structure. Any one will do. You could simply stuff a Cell or a raw pointer in there. However, we’d really like to not spend the extra space for the Cell. We want the effect of putting a non-Sync member in our struct, say, but not the physical effects of doing so… like it’s a phantom.

Sounds a lot like [PhantomData] from the standard library!

I have a type like this in my codebase that I use to opt types out of automatic Sync:

#[derive(Copy, Clone, Debug, Default)]
struct NotSyncMarker(PhantomData<Cell<u8>>);

I can then stuff it into some structure like so:

struct ThingThatShouldNotBeSync {
    first_data: AtomicU8,
    second_data: AtomicU8,

    _marker: NotSyncMarker,
}

This makes my struct act like it contains a Cell, which (as we saw above) is not Sync, but doesn’t spend any memory to actually store a Cell because it’s a phantom.

I’m a little surprised that Rust doesn’t ship a zero-sized marker type² for this in std::marker, like they do for PhantomPinned, but fortunately it’s straightforward to write your own if you need it.

Rust uses the term “marker” for both traits and types. The idea is this: both marker traits and marker types mark something as having a particular property, without adding anything – operations or data. A marker trait affects a type, while a marker type affects a value.

The other concurrency trait: `Send` and when to avoid it

I’ve focused on Sync above because Sync embodies most languages’ notion of what “thread safe” means: data that can be accessed and updated from multiple threads without corrupting things.

However, there’s another kind of safety in the presence of threads, a kind of safety that is more subtle and less common, but critical in the rare cases where you need it.

Consider this: many operating systems keep track of which thread (if any) has locked a mutex. If the data guarded by the mutex is accessed by any other thread, even if it’s correctly locked throughout the access, trouble can occur. For instance, many operating systems use mutex ownership information to implement thread priority management strategies to avoid priority inversion, or to keep track of which mutexes are locked to avoid deadlock. If you confuse the operating system about which thread is responsible for accessing the data guarded by the mutex, you defeat these mechanisms, and you’re headed to Bug Town.

In Rust, you can only refer to the data guarded by a Mutex through the mutex guard returned by the Mutex::lock operation. (If this is unfamiliar to you, I recommend reading my article on it; this section will make more sense.) This means that, to produce a program where one thread locks a mutex, and then a different thread accesses the guarded data, we would need to arrange for that thread to have access to the mutex guard. If we can do that, we can produce programs that defeat those operating system protections I described above, and bad things will happen.

Let’s make bad things happen, shall we?

use std::sync::{Mutex, mpsc::channel};
use std::thread;

fn main() {
    // Create a mutex that guards a basic u32.    
    let mutex = Mutex::new(0_u32);
   
    // Make a channel for passing guard around.
    let (guard_in, guard_out) = channel();

    // Create two scoped threads that share the locals above.
    thread::scope(|s| {
        s.spawn(move || {
            // On the first thread, we will become the owner
            // of the mutex.
            let g = mutex.lock().unwrap();
            // Send the guard to our collaborator.
            guard_in.send(g).unwrap();
        });
        s.spawn(move || {
            // Receive the mutex guard sent to us by the
            // other thread.
            let g = guard_out.recv().unwrap();
            // Update the guarded data FROM THE WRONG THREAD!
            *g = 42;
            // unlock the mutex implicitly
        });
    });
}

That should do it! Now, when we run this, we see…

…oh, the compiler is saying something:

error[E0277]: `MutexGuard<'_, Foobar>` cannot be sent between threads safely
  --> src/main.rs:12:17
   |
12 |           s.spawn(move || {
   |  ___________-----_^
   | |           |
   | |           required by a bound introduced by this call
13 | |             // On the first thread, we will become the owner of the mutex.
14 | |             let g = mutex.lock().unwrap();
15 | |             // Send it to our collaborator.
16 | |             guard_in.send(g).unwrap();
17 | |         });
   | |_________^ `MutexGuard<'_, Foobar>` cannot be sent between threads safely
   |
   = help: the trait `Send` is not implemented for `MutexGuard<'_, Foobar>`

That’s very similar to the warning we got before when we tried to share data between threads, but the words are slightly different: instead of “cannot be shared across threads safely,” it says “cannot be sent between threads safely.”

We’ve hit the other concurrency-related trait in the Rust standard library: Send. A type implements Send if values of that type are safe to send between threads. It so happens that, for the reasons described above, mutex guards do not implement Send, and so our bug is avoided.

Like Sync, Send is inferred automatically by the compiler. In my experience, it is far less common to opt out of Send than to opt out of Sync. Here are some cases where you might want to opt out:

If you’re implementing mutexes.
If your type acts as a handle to some sort of thread-local data, which would be meaningless if accessed from another thread.
If you’re doing some sort of logging or tracing that includes thread names, and you’d like your trace spans to indicate consistent thread names.

Rust does not ship a marker type for “not Send” in the standard library as of this writing, so until negative_impls are stabilized, you can opt out of Send explicitly the same way we opted out of Sync: by pretending to include some basic type that is not Send.

Raw pointers happen to not implement Send, so we can do this:

#[derive(Copy, Clone, Debug, Default)]
struct NotSendMarker(PhantomData<*mut u8>);

Conclusion

Rust is the only language where I regularly write non-thread-safe data structures without caveats or fear. In cases where they apply, such data structures can be simpler and often faster than concurrent thread-safe data structures – with the drawback, of course, that you can’t use threads to speed up operations on them.

If you’re coming to Rust from another language that emphasizes concurrency, such as Java or Go or (increasingly) C, keep this in mind:

You don’t necessarily need to make every data structure thread-safe, and
If you get it wrong, you’ll get a compiler error, not a weird runtime bug.

Appendix: the relationship between `Sync` and `Send`

Sync and Send interact in ways that make things much more convenient for us programmers. None of this is language magic – it’s just trait impls in the standard library.

If T is Sync, then &T is Send. This makes sense: to share a value across threads, each thread needs a reference to it, so we need to be able to send that reference.
If T is Send, then &mut T is also Send – meaning if you can transfer ownership across threads, you can also lend so long as you lose access for the duration. (Most cases where this works will also require the reference to be 'static.)

Appendix: opting back in

If you’ve written a data structure that is carefully engineered to be thread safe (in either the Sync or Send sense), it may still contain types that are themselves not Sync or Send, and so the compiler will not automatically infer an impl. This is somewhat rare, but can happen if you’re building a shared mutex-like data structure using std::cell::UnsafeCell. In these cases you may need to explicitly implement Sync and/or Send.

Because implementing these traits incorrectly would open up your program to data races and ghosts, implementing them explicitly is unsafe. So, you would do it like this:

unsafe impl Sync for MyType {}

Think very carefully before doing this.

#api-design #concurrency #rust

Cliffle