The First-Mover Allocator Pattern

2020-09-07 (updated 2024-03-02)

(I’ve updated this pattern, since a lot has changed since 2020. The recommendations here should be ready for the Rust 2024 edition, and are closer to correct in a post-pointer-provenance world.)

Here’s another useful Rust pattern. Like the Typestate Pattern before it, I wrote this because I haven’t seen the sort of obsessively nerdy writeup that I wanted to read. And, as with the Typestate Pattern, I didn’t invent this — I’m merely documenting and generalizing it.

What problem are we trying to solve?

Your program uses a buffer or other chunk of memory. You would prefer not to allocate this memory from the heap — perhaps because your application doesn’t have a heap, or because you want to get any out-of-memory errors at link time, instead of at run-time. You would also prefer to interact with this memory from safe code.

First attempt: as we would write it in C

You can declare a block of memory, fixed in location at link time and independent from the heap, in Rust using essentially the same technique you’d use in C:

static mut BUFFER: [u8; 1024] = [0; 1024];

If you’re targeting an embedded device with limited RAM, and you try to declare more of these than will fit in memory, you’ll get a failure at link time. This is good! Better to discover you’re out of memory where your customers can’t see.

However, access to static mut is unsafe in Rust. This means every time you want to interact with this buffer, you’re stuck doing something like this:

unsafe {
    // Do not do this, this is bad, keep reading
    BUFFER[0] = 0xCB;
}

The simplest approach will scatter little unsafe blocks all over the codebase, making them very difficult to reason about! unsafe is problematic in part because it destroys your ability to reason locally about code, so it’s best to keep all the unsafe related to a thing in one place.

“Well,” you might say, “no problem, I will extract a function and not repeat the unsafe block.”

fn write_buffer(offset: usize, value: u8) {
    // Please don't do this, read the note below.
    unsafe {
        BUFFER[offset] = value;
    }
}

This has made things worse: you’ve replaced code that is unsafe with code that is unsound! Off the top of my head, this function can be used to break two of Rust’s safety rules:

You can get a reference to &BUFFER and, while holding it, call write_buffer — altering the contents of the reference that was supposed to be immutable. This is some serious undefined behavior.
Every call to write_buffer is a potential data race; nothing prevents you from hammering the buffer from multiple threads (or from prioritized interrupt service routines). Also undefined behavior.

There might be others, but that’s enough for me to be wary!

Second attempt: using module visibility

You can bound these risks by isolating BUFFER in its own module with accessor routines, so that only the accessors can see BUFFER. This way, you can reason about how many sites in the code might create a &mut BUFFER or &BUFFER, and ensure that they never overlap. (Note that this does not fix the data races. See below.)

// Slightly safer if this is the ENTIRE MODULE'S CODE. Adding more code to the
// module can break soundness. Do not copy this example, keep reading.

static mut BUFFER: [u8; 1024] = [u8; 1024];

pub fn read(offset: usize) -> u8 {
    unsafe {
        BUFFER[offset]
    }
}

pub fn write(offset: usize, value: u8) {
    unsafe {
        BUFFER[offset] = value
    }
}

// End of module, so no other code can directly see `BUFFER`

This approach mostly works for a buffer of simple u8, as long as you’re happy with a limited set of encapsulated operations. Each function references the static mut in a small scope, and references don’t escape. It looks like you can’t possibly have aliasing references… until you consider threads or ISRs. If it’s possible for write to be executed concurrently, we’re back in data race land. Because nothing in the code prevents this, we’re still unsound: we have created a bug that is waiting to happen later¹.

If you’re still happy with a limited accessor API, you can actually fix the data races by using AtomicU8 instead of u8 — but that winds up being a different solution. You can declare an array of AtomicU8 as a static (no mut) and avoid some headaches — but you can never treat it as a [u8], and that can be really limiting.

Breaking the second attempt: “but I miss references”

Reading and writing single bytes is technically enough I guess but is cramping your style. You want to be able to get pointers to parts of the buffer, to process stuff incrementally without keeping track of everything as a usize. So you add this function:

pub fn get_write_slice(offset: usize, len: usize) -> &'static mut [u8] {
    unsafe {
        &mut BUFFER[offset .. offset + len]
    }
}

Or in Rust 1.51 and later, you could do the same thing using the core::ptr::addr_of_mut! macro – which is required to silence a warning in Rust 1.77 and later:

pub fn get_write_slice(offset: usize, len: usize) -> &'static mut [u8] {
    unsafe {
        // Note: just because this silences the warning doesn't mean
        // it's correct! Keep reading.
        &mut (*core::ptr::addr_of_mut!(BUFFER))[offset .. offset + len]
    }
}

This compiles, but only because you told the compiler not to think too hard about it. You have created a vending machine that will hand out &mut references to a data structure willy-nilly. It’s incredibly easy to violate the aliasing rules with this API:

let slice1 = get_write_slice(0, 10);
let slice2 = get_write_slice(0, 20);
// We are now in undefined behavior land

Even a get_read_slice API that returns &[u8] (no mut) is unsound, because it doesn’t stop another part of the code from sneaking behind your back and calling write.

We need a better approach, something that lets us manipulate references however we want without creating subtle and potentially catastrophic bugs.

This is the problem the First-Mover Allocator Pattern addresses.

The pattern

The basic problem here, and part of the reason why static mut is unsafe to access in the first place, is that we can’t reason at compile-time about how it will be accessed. It’s a global! Will all operations come from one thread, or many? Are any of the operations reentrant, deliberately or accidentally?

The idea behind the First-Mover Allocator is to spend a small amount at run-time — one byte of RAM and a few instructions — to let us ignore the problem at compile-time. If that sounds like the opposite of what Rust usually tries to do, you’re right. Getting stuff right at compile time is great, but if you can’t, run-time is the second best place to get it right².

The pattern works like this:

Ensure there is a single point in the program, both textually in code, and at run-time, where the static structure gets used directly. Manipulate it through references everywhere else.
Use a runtime check to detect hitting that point in code more than once, which would mean it isn’t really a single point.

Here is the simplest case, for the 1kiB buffer we saw in the previous section:

use core::sync::atomic::{AtomicBool, Ordering};

/// Returns an exclusive reference to the static buffer to the first
/// caller. There should not be a second caller, but if there is, they
/// will panic.
fn get_buffer() -> &'static mut [u8; 1024] {
    // We use a global flag to indicate whether any caller has begun
    // executing this function.
    static TAKEN: AtomicBool = AtomicBool::new(false);

    // In the normal case where this is only called once, it sets the
    // TAKEN flag and never checks it again.
    //
    // If this is called again later, we will detect it.
    //
    // If multiple callers get here simultaneously, one of them will
    // "win" (whichever executes the `swap` first).
    if TAKEN.swap(true, Ordering::AcqRel) {
        panic!();
    }

    // Here's the actual static item. Because it's declared in the
    // function body, we know that every access to it by name will be
    // in this function.
    static mut BUFFER: [u8; 1024] = [0; 1024];

    // Safety: because BUFFER is limited in scope, this is the only
    // line of code in the whole program that can &mut it. This,
    // combined with the TAKEN check above, ensures that the returned
    // &mut is truly exclusive.
    //
    // Our use of the `addr_of_mut!` macro here effectively asserts
    // to the compiler that we've done all this homework.
    unsafe { &mut *core::ptr::addr_of_mut!(BUFFER) }
}

This returns a &mut to a static-scope buffer. The recipient of that &mut can do with it as they wish — anything they can do in safe code is sound:

get_buffer returns exactly one &mut to BUFFER. No other references to BUFFER exist, because the caller simply can’t see it — it’s confined to the function body. Were they to call get_buffer again, they would not get a new &mut. (They would panic.) Thus we avoid aliasing of the buffer.
The &mut itself can’t be shared across threads (though it can be sent between threads if the buffer type is Send), so we’re not concerned about data races.
The caller could re-borrow all or part of the buffer as non-exclusive references, which could then be shared across threads if the buffer’s type is Sync — but being Sync means that it prevents data races internally, so we’re still good.

It’s implied by some of the points above, but worth calling out explicitly: why are the static declarations inside the function body? They don’t strictly have to be, but it comes back to that drum I can’t stop beating: local reasoning. If the static items were at module scope, and the module were longer than this one function, then the possible places that could touch BUFFER and violate our invariants gets larger. Putting them inside the function means that any access has to appear in the function body, so if you want to convince yourself that any accesses are done correctly, you don’t have to read any other code.

There is actually another pattern that can be used to prove properties of this at compile time. I need to devote a separate article to it, but the basic idea is handing out tokens to threads that are not Send, and using possession of such a token (i.e. passing it as an argument) as evidence that the caller is confined to the associated thread. There is a simple example of using this to detect whether API is being called from interrupt handlers in my m4vga source.

Performance

If you’re writing Rust, and worrying about things like static memory allocation, you’re probably concerned about code size, performance, or both. The First-Mover Allocator moves a check from compile-time to run-time. How much does this hurt?

It depends.

As a worst case: consider what we’re literally asking the machine to do.

In addition to our static buffer, we’ve allocated a single byte for bookkeeping. Assuming your buffer is of nontrivial size, that’s probably affordable.
Each get_buffer-style function is called exactly once in a working program. (It could also be called zero times, but in that case, the performance is fantastic, so I won’t consider it further.)
Calling get_buffer performs one atomic swap of a byte in memory, followed by a conditional branch. This is a handful of instructions.
Because it happens exactly once, the execution time of get_buffer isn’t a huge concern — but the “atomic swap and conditional branch” code will also be reasonably quick on a normal CPU.
Note that the return value of get_buffer is an array reference, not a slice. If it returned a slice (&[u8]) then the caller would lose information about the length of the array, and would likely add a bunch of bounds checks that would be trivially removed when accessing the static directly. However, an array reference (&[u8; 1024]) is just a pointer and carries length information, so at worst the caller only loses the link-time knowledge of which absolute address it will be accessing. This can cause lower performance code to be generated, particularly on register-poor architectures like x86, but hasn’t shown up in any of my measurements in practice.

As for the practical case: in my tests, the overhead is easy for the compiler to eliminate.

The allocator function get_buffer is called in only one place in the code (in a working program), so there’s little reason not to inline it.
Once inlined, the caller is now directly accessing the static — only in the ways get_buffer did — enabling the compiler to optimize access code as though it had been directly accessing the static all along.

So we’re left with an atomic swap and conditional branch. In my applications, I can afford this in exchange for greater safety and improved local reasoning.

Variation: uninitialized memory

When you declare a static like we did above:

static mut BUFFER: [u8; 1024] = [0; 1024];

you are declaring an initialized chunk of memory — here, initialized to zero, though you could initialize it to any value you like.

Static initialization isn’t magic, and on embedded systems, particularly bare metal embedded systems, it’s a process you will need to think about. The normal way for a static like this to get initialized is during the runtime startup phase of execution, which happens before the Rust main is called. The runtime (often the r0 crate, but you could provide your own) is responsible for filling out all initialized memory with predefined patterns, like the zeros we requested here.

Normally, this is what you want.

Not all RAM is initialized by this process. Stack is typically left uninitialized, as are heaps used by “real” allocators, and — critically for embedded systems — initialization typically only applies to a single contiguous RAM address range.

But what if your system has other kinds of RAM? In particular, consider the STM32 series, which (like many other microcontrollers) sport a “USB SRAM” that is essentially dedicated to the USB controller. This SRAM is in an entirely different section of address space, and has weird access rules, so we can’t expect the runtime to initialize it. If we want to place a buffer there, we’re on our own.

In the examples that follow, I will assume that the linker script you’re using places anything assigned link_section = ".usbsram" to the USB SRAM instead of normal RAM.

The first option is to pretend the problem doesn’t exist.

// la la la everything is fine    PLEASE DON'T DO THIS
#[link_section = ".usbsram"]
static mut BUFFER: [u8; 1024] = [u8; 1024];

// I know that BUFFER isn't *really* initialized
// so I am going to do it here! (note: this is wrong.)
unsafe {
    for byte in &mut BUFFER {
        *byte = 0;
    }
}

// Now everything is OK right? ...right?

The problem here is that you’ve just lied to the compiler — you’ve declared BUFFER as initialized even though it isn’t really. The compiler works hard on your behalf, and makes use of all the information you give it to produce better code. If some of that information is dishonest, the compiler will do the wrong thing. In this case, the optimizer is very likely to notice that you’re re-zeroing the buffer and remove your loop. Surprise!

We need to come clean, and admit to the compiler that anything we put in .usbsram doesn’t really get initialized. It would be nice to be able to express this at a link_section level, but we can’t, at least not right now. We can express it for each static item, though:

#[link_section = ".usbsram"]
static mut BUFFER: MaybeUninit<[u8; 1024]> = MaybeUninit::uninit();

There — an explicitly uninitialized static buffer. If you’ve read my Learn Rust The Dangerous Way series, you’ll be no stranger to the MaybeUninit type; when the solution to your problem involves MaybeUninit, you know you’re doing the fun stuff.

To use this uninitialized buffer, we need a slightly more verbose version of get_buffer.

use core::mem::MaybeUninit;
use core::sync::atomic::{AtomicBool, Ordering};

/// Returns an exclusive reference to the static buffer to the first
/// caller. There should not be a second caller, but if there is, they
/// will panic.
fn get_buffer() -> &'static mut [u8; 1024] {
    // This section is unchanged from the original.
    static TAKEN: AtomicBool = AtomicBool::new(false);
    if TAKEN.swap(true, Ordering::AcqRel) {
        panic!();
    }

    // Here's our uninitialized buffer!
    const N: usize = 1024;
    let buffer = unsafe {
        static mut BUFFER: MaybeUninit<[u8; N]> = MaybeUninit::uninit();
        &mut *core::ptr::addr_of_mut!(BUFFER)
    };

    // We now have a &'static mut MaybeUninit<[u8; 1024]> ... which
    // isn't really what we want.

    // We'd like to initialize it before returning it. This is a really
    // good idea: it doesn't super matter for u8, where any combination
    // of 8 bits is valid, but if you had an array of bool and one of
    // the bools was 3 ... things will get weird for you real fast. Do
    // it.
    //
    // It's also increasingly relied upon by the compiler.
  
    // Convert an uninitialized array into an array of uninitialized.
    //
    // (Note: once MaybeUninit::transpose is stable, use that instead.)
    let buf: &mut [MaybeUninit<u8>; N] = unsafe {
        core::mem::transmute(buffer)
    };
    // Loop over the array (now that we can) and initialize all the
    // things.
    for slot in buf.iter_mut() {
        // MaybeUninit gives us a convenient "write" operation:
        slot.write(0);
    }

    // Safety: all the properties cited in the original get_buffer,
    // plus: we have just initialized the memory, so it is safe to
    // treat it as initialized.
    unsafe { buf.assume_init_mut() }
}

The first thing you probably noticed, particularly if you haven’t played with uninitialized arrays before, is that we’ve got some transmute happening. transmute is like a reciprocating saw: it’s always the wrong tool for the job, except when it isn’t. The number of transmute calls required here has decreased over time as the APIs on MaybeUninit have grown and stabilized, but today we still need one use of it: to go from &mut MaybeUninit<[u8; 1024]> to &mut [MaybeUninit<u8>; 1024] — that is, to treat an uninitialized array as an array of uninitialized elements. The memory is still uninitialized, but now we can iterate over its uninitialized parts. (Someday soon MaybeUninit::transpose may let us do this in a safer fashion.)

The method used to initialize the memory is important: MaybeUninit::write is almost always the right choice. There are other ways to do it, but it’s easy to accidentally produce a reference to uninitialized memory while using them. Just use write.

In practice, initialize your buffers using a constant expression, instead of this method, whenever possible. The technique shown here is not a performance win — letting the runtime initialize memory will basically always be faster than doing it yourself. This is more code and more complexity and can introduce bugs if done incorrectly. Use this technique for uninitialized buffers only when your buffer exists in some RAM that cannot be initialized by the runtime (as in this example). This basically only happens in embedded or bare metal environments.

FAQ / Appendix

Why are you calling this an allocator? Because it’s a mechanism that allocates memory. It happens to only manage one block of memory, and it will only allocate it once, and cannot free, but it’s still an allocator.

Why “First-Mover”? This pattern is one possible way to safely manage access to static buffers in Rust. This particular way privileges the first caller at the expense of any later callers. This isn’t the only way to do it, so I think it’s important to put something in the name that distinguishes this from other approaches. The name is an analogy to the phrase “first-mover advantage.”

Shouldn’t this just be a macro? Sure; the cortex_m crate provides a singleton! macro that implements one version of this. Assuming you don’t need control over what happens on the error case, and you don’t need any special initialization behavior, it will work fine. But it’s easy to paint yourself into a corner where you really do need to write the code yourself (which is how I got here). Given that the manual implementation is less than 10 lines of code, it’s worth understanding how it works — even if you usually use it through a macro.

Can you just use Option::take? Only in a single-threaded context, because replacing an enum like Option with another variant of the same enum isn’t done with an atomic swap. This means it can’t replace the atomic TAKEN flag.

I don’t have to do this in C, why is it necessary in Rust? The fact that a C compiler doesn’t encourage you to do this doesn’t mean you shouldn’t. It’s dramatically simpler to declare and use static buffers in C. It’s also usually a bad idea. It makes code non-reentrant and makes thread safety difficult. It has been a common source of bugs and unexpected behaviors in C programs dating back to the language’s origins (there are several cases of this in the early Unix kernel and standard libraries). Rust is specifically concerned with checking thread safety and data races at compile time, so this pattern sets off alarm bells in the compiler, requiring us to put in extra work to reassure it — and protect ourselves from our own mistakes!

Okay, so, can I use this pattern in C to make static buffers less error-prone? Kind of. The basic pattern, where a static is kept in an isolated scope and only handed to the first caller who asks, works in C, and can be valuable. (I’ve used it.) However, because C compilers don’t reason about pointer aliasing, once the pointer has been returned, it’s easy to break things by accident. This is why I present the pattern as a Rust pattern — it works in other languages but is particularly useful in Rust.

#design-patterns #memory-management #rust

Cliffle