The First-Mover Allocator Pattern
- What problem are we trying to solve?
- The pattern
- Performance
- Variation: uninitialized memory
- FAQ / Appendix
(I’ve updated this pattern, since a lot has changed since 2020. The recommendations here should be ready for the Rust 2024 edition, and are closer to correct in a post-pointer-provenance world.)
Here’s another useful Rust pattern. Like the Typestate Pattern before it, I wrote this because I haven’t seen the sort of obsessively nerdy writeup that I wanted to read. And, as with the Typestate Pattern, I didn’t invent this — I’m merely documenting and generalizing it.
What problem are we trying to solve?
Your program uses a buffer or other chunk of memory. You would prefer not to allocate this memory from the heap — perhaps because your application doesn’t have a heap, or because you want to get any out-of-memory errors at link time, instead of at run-time. You would also prefer to interact with this memory from safe code.
First attempt: as we would write it in C
You can declare a block of memory, fixed in location at link time and independent from the heap, in Rust using essentially the same technique you’d use in C:
static mut BUFFER: = ;
If you’re targeting an embedded device with limited RAM, and you try to declare more of these than will fit in memory, you’ll get a failure at link time. This is good! Better to discover you’re out of memory where your customers can’t see.
However, access to static mut
is unsafe in Rust. This means every time you
want to interact with this buffer, you’re stuck doing something like this:
unsafe
The simplest approach will scatter little unsafe
blocks all over the codebase,
making them very difficult to reason about! unsafe
is problematic in part
because it destroys your ability to reason locally about code, so it’s best to
keep all the unsafe
related to a thing in one place.
“Well,” you might say, “no problem, I will extract a function and not repeat the
unsafe
block.”
This has made things worse: you’ve replaced code that is unsafe
with code that
is unsound! Off the top of my head, this function can be used to break two of
Rust’s safety rules:
-
You can get a reference to
&BUFFER
and, while holding it, callwrite_buffer
— altering the contents of the reference that was supposed to be immutable. This is some serious undefined behavior. -
Every call to
write_buffer
is a potential data race; nothing prevents you from hammering the buffer from multiple threads (or from prioritized interrupt service routines). Also undefined behavior.
There might be others, but that’s enough for me to be wary!
Second attempt: using module visibility
You can bound these risks by isolating BUFFER
in its own module with accessor
routines, so that only the accessors can see BUFFER
. This way, you can reason
about how many sites in the code might create a &mut BUFFER
or &BUFFER
, and
ensure that they never overlap. (Note that this does not fix the data races.
See below.)
// Slightly safer if this is the ENTIRE MODULE'S CODE. Adding more code to the
// module can break soundness. Do not copy this example, keep reading.
static mut BUFFER: = ;
// End of module, so no other code can directly see `BUFFER`
This approach mostly works for a buffer of simple u8
, as long as you’re happy
with a limited set of encapsulated operations. Each function references the
static mut
in a small scope, and references don’t escape. It looks like you
can’t possibly have aliasing references… until you consider threads or ISRs.
If it’s possible for write
to be executed concurrently, we’re back in data
race land. Because nothing in the code prevents this, we’re still unsound: we
have created a bug that is waiting to happen later1.
If you’re still happy with a limited accessor API, you can actually
fix the data races by using AtomicU8
instead of u8
— but that winds
up being a different solution. You can declare an array of AtomicU8
as a
static
(no mut
) and avoid some headaches — but you can never treat
it as a [u8]
, and that can be really limiting.
Breaking the second attempt: “but I miss references”
Reading and writing single bytes is technically enough I guess but is cramping
your style. You want to be able to get pointers to parts of the buffer, to
process stuff incrementally without keeping track of everything as a usize
. So
you add this function:
Or in Rust 1.51 and later, you could do the same thing using the
core::ptr::addr_of_mut!
macro – which is required to silence a warning in
Rust 1.77 and later:
This compiles, but only because you told the compiler not to think too hard
about it. You have created a vending machine that will hand out &mut
references to a data structure willy-nilly. It’s incredibly easy to violate the
aliasing rules with this API:
let slice1 = get_write_slice;
let slice2 = get_write_slice;
// We are now in undefined behavior land
Even a get_read_slice
API that returns &[u8]
(no mut
) is unsound, because
it doesn’t stop another part of the code from sneaking behind your back and
calling write
.
We need a better approach, something that lets us manipulate references however we want without creating subtle and potentially catastrophic bugs.
This is the problem the First-Mover Allocator Pattern addresses.
The pattern
The basic problem here, and part of the reason why static mut
is unsafe to
access in the first place, is that we can’t reason at compile-time about how
it will be accessed. It’s a global! Will all operations come from one thread, or
many? Are any of the operations reentrant, deliberately or accidentally?
The idea behind the First-Mover Allocator is to spend a small amount at run-time — one byte of RAM and a few instructions — to let us ignore the problem at compile-time. If that sounds like the opposite of what Rust usually tries to do, you’re right. Getting stuff right at compile time is great, but if you can’t, run-time is the second best place to get it right2.
The pattern works like this:
-
Ensure there is a single point in the program, both textually in code, and at run-time, where the static structure gets used directly. Manipulate it through references everywhere else.
-
Use a runtime check to detect hitting that point in code more than once, which would mean it isn’t really a single point.
Here is the simplest case, for the 1kiB buffer we saw in the previous section:
use ;
/// Returns an exclusive reference to the static buffer to the first
/// caller. There should not be a second caller, but if there is, they
/// will panic.
This returns a &mut
to a static-scope buffer. The recipient of that &mut
can
do with it as they wish — anything they can do in safe code is sound:
-
get_buffer
returns exactly one&mut
toBUFFER
. No other references toBUFFER
exist, because the caller simply can’t see it — it’s confined to the function body. Were they to callget_buffer
again, they would not get a new&mut
. (They wouldpanic
.) Thus we avoid aliasing of the buffer. -
The
&mut
itself can’t be shared across threads (though it can be sent between threads if the buffer type isSend
), so we’re not concerned about data races. -
The caller could re-borrow all or part of the buffer as non-exclusive references, which could then be shared across threads if the buffer’s type is
Sync
— but beingSync
means that it prevents data races internally, so we’re still good.
It’s implied by some of the points above, but worth calling out explicitly:
why are the static
declarations inside the function body? They don’t
strictly have to be, but it comes back to that drum I can’t stop beating:
local reasoning. If the static
items were at module scope, and the module
were longer than this one function, then the possible places that could touch
BUFFER
and violate our invariants gets larger. Putting them inside the
function means that any access has to appear in the function body, so if you
want to convince yourself that any accesses are done correctly, you don’t have
to read any other code.
There is actually another pattern that can be used to prove
properties of this at compile time. I need to devote a separate article to it,
but the basic idea is handing out tokens to threads that are not Send
, and
using possession of such a token (i.e. passing it as an argument) as evidence
that the caller is confined to the associated thread. There is a simple
example of using this to detect whether API is being called from interrupt
handlers in my m4vga source.
Performance
If you’re writing Rust, and worrying about things like static memory allocation, you’re probably concerned about code size, performance, or both. The First-Mover Allocator moves a check from compile-time to run-time. How much does this hurt?
It depends.
As a worst case: consider what we’re literally asking the machine to do.
-
In addition to our static buffer, we’ve allocated a single byte for bookkeeping. Assuming your buffer is of nontrivial size, that’s probably affordable.
-
Each
get_buffer
-style function is called exactly once in a working program. (It could also be called zero times, but in that case, the performance is fantastic, so I won’t consider it further.) -
Calling
get_buffer
performs one atomic swap of a byte in memory, followed by a conditional branch. This is a handful of instructions. -
Because it happens exactly once, the execution time of
get_buffer
isn’t a huge concern — but the “atomic swap and conditional branch” code will also be reasonably quick on a normal CPU. -
Note that the return value of
get_buffer
is an array reference, not a slice. If it returned a slice (&[u8]
) then the caller would lose information about the length of the array, and would likely add a bunch of bounds checks that would be trivially removed when accessing thestatic
directly. However, an array reference (&[u8; 1024]
) is just a pointer and carries length information, so at worst the caller only loses the link-time knowledge of which absolute address it will be accessing. This can cause lower performance code to be generated, particularly on register-poor architectures like x86, but hasn’t shown up in any of my measurements in practice.
As for the practical case: in my tests, the overhead is easy for the compiler to eliminate.
-
The allocator function
get_buffer
is called in only one place in the code (in a working program), so there’s little reason not to inline it. -
Once inlined, the caller is now directly accessing the
static
— only in the waysget_buffer
did — enabling the compiler to optimize access code as though it had been directly accessing thestatic
all along.
So we’re left with an atomic swap and conditional branch. In my applications, I can afford this in exchange for greater safety and improved local reasoning.
Variation: uninitialized memory
When you declare a static
like we did above:
static mut BUFFER: = ;
you are declaring an initialized chunk of memory — here, initialized to zero, though you could initialize it to any value you like.
Static initialization isn’t magic, and on embedded systems, particularly bare
metal embedded systems, it’s a process you will need to think about. The normal
way for a static
like this to get initialized is during the runtime startup
phase of execution, which happens before the Rust main
is called. The runtime
(often the r0
crate, but you could provide your own) is responsible for
filling out all initialized memory with predefined patterns, like the zeros we
requested here.
Normally, this is what you want.
Not all RAM is initialized by this process. Stack is typically left uninitialized, as are heaps used by “real” allocators, and — critically for embedded systems — initialization typically only applies to a single contiguous RAM address range.
But what if your system has other kinds of RAM? In particular, consider the STM32 series, which (like many other microcontrollers) sport a “USB SRAM” that is essentially dedicated to the USB controller. This SRAM is in an entirely different section of address space, and has weird access rules, so we can’t expect the runtime to initialize it. If we want to place a buffer there, we’re on our own.
In the examples that follow, I will assume that the linker script you’re using
places anything assigned link_section = ".usbsram"
to the USB SRAM instead of
normal RAM.
The first option is to pretend the problem doesn’t exist.
// la la la everything is fine PLEASE DON'T DO THIS
static mut BUFFER: = ;
// I know that BUFFER isn't *really* initialized
// so I am going to do it here! (note: this is wrong.)
unsafe
// Now everything is OK right? ...right?
The problem here is that you’ve just lied to the compiler — you’ve declared
BUFFER
as initialized even though it isn’t really. The compiler works hard on
your behalf, and makes use of all the information you give it to produce better
code. If some of that information is dishonest, the compiler will do the wrong
thing. In this case, the optimizer is very likely to notice that you’re
re-zeroing the buffer and remove your loop. Surprise!
We need to come clean, and admit to the compiler that anything we put in
.usbsram
doesn’t really get initialized. It would be nice to be able to
express this at a link_section
level, but we can’t, at least not right now. We
can express it for each static item, though:
static mut BUFFER: = uninit;
There — an explicitly uninitialized static buffer. If you’ve read my Learn
Rust The Dangerous Way series, you’ll be no stranger to the
MaybeUninit
type; when the solution to your problem involves MaybeUninit
,
you know you’re doing the fun stuff.
To use this uninitialized buffer, we need a slightly more verbose version of
get_buffer
.
use MaybeUninit;
use ;
/// Returns an exclusive reference to the static buffer to the first
/// caller. There should not be a second caller, but if there is, they
/// will panic.
The first thing you probably noticed, particularly if you haven’t played with
uninitialized arrays before, is that we’ve got some transmute
happening.
transmute
is like a reciprocating saw: it’s always the wrong tool for the job,
except when it isn’t. The number of transmute
calls required here has
decreased over time as the APIs on MaybeUninit
have grown and stabilized, but
today we still need one use of it: to go from &mut MaybeUninit<[u8; 1024]>
to
&mut [MaybeUninit<u8>; 1024]
— that is, to treat an uninitialized array
as an array of uninitialized elements. The memory is still uninitialized, but
now we can iterate over its uninitialized parts. (Someday soon
MaybeUninit::transpose
may let us do this in a safer fashion.)
The method used to initialize the memory is important: MaybeUninit::write
is
almost always the right choice. There are other ways to do it, but it’s easy to
accidentally produce a reference to uninitialized memory while using them. Just
use write
.
In practice, initialize your buffers using a constant expression, instead of this method, whenever possible. The technique shown here is not a performance win — letting the runtime initialize memory will basically always be faster than doing it yourself. This is more code and more complexity and can introduce bugs if done incorrectly. Use this technique for uninitialized buffers only when your buffer exists in some RAM that cannot be initialized by the runtime (as in this example). This basically only happens in embedded or bare metal environments.
FAQ / Appendix
Why are you calling this an allocator? Because it’s a mechanism that
allocates memory. It happens to only manage one block of memory, and it will
only allocate it once, and cannot free
, but it’s still an allocator.
Why “First-Mover”? This pattern is one possible way to safely manage access to static buffers in Rust. This particular way privileges the first caller at the expense of any later callers. This isn’t the only way to do it, so I think it’s important to put something in the name that distinguishes this from other approaches. The name is an analogy to the phrase “first-mover advantage.”
Shouldn’t this just be a macro? Sure; the cortex_m
crate provides a
singleton!
macro that implements one version of this. Assuming you don’t need
control over what happens on the error case, and you don’t need any special
initialization behavior, it will work fine. But it’s easy to paint yourself into
a corner where you really do need to write the code yourself (which is how I got
here). Given that the manual implementation is less than 10 lines of code, it’s
worth understanding how it works — even if you usually use it through a macro.
Can you just use Option::take
? Only in a single-threaded context, because
replacing an enum
like Option
with another variant of the same enum
isn’t
done with an atomic swap. This means it can’t replace the atomic TAKEN
flag.
I don’t have to do this in C, why is it necessary in Rust? The fact that a C
compiler doesn’t encourage you to do this doesn’t mean you shouldn’t. It’s
dramatically simpler to declare and use static
buffers in C. It’s also usually
a bad idea. It makes code non-reentrant and makes thread safety difficult. It
has been a common source of bugs and unexpected behaviors in C programs dating
back to the language’s origins (there are several cases of this in the early
Unix kernel and standard libraries). Rust is specifically concerned with
checking thread safety and data races at compile time, so this pattern sets off
alarm bells in the compiler, requiring us to put in extra work to reassure it
— and protect ourselves from our own mistakes!
Okay, so, can I use this pattern in C to make static buffers less
error-prone? Kind of. The basic pattern, where a static
is kept in an
isolated scope and only handed to the first caller who asks, works in C, and can
be valuable. (I’ve used it.) However, because C compilers don’t reason about
pointer aliasing, once the pointer has been returned, it’s easy to break things
by accident. This is why I present the pattern as a Rust pattern — it
works in other languages but is particularly useful in Rust.