I’m trying to do something kind of unusual with lilos: in addition to almost
all the APIs being safe-in-the-Rust sense, I’m also attempting to create an
entire system API that is cancel-safe. I’ve written a lot about Rust’s async
feature and its notion of cancellation recently, such as my suggestion for
reframing how we think about async/await.
My thoughts on this actually stem from my early work on lilos, where I started
beating the drum of cancel-safety back in 2020. My notion
of what it means to be cancel-safe has gotten more nuanced since then, and I’ve
recently made the latest batch of changes to try to help applications built on
lilos be more robust by default.
So, wanna nerd out about async API design and robustness? I know you do.
I recently posted about my debugger for async Rust, which can
generate what I call “await-traces” for async code that’s suspended and not
currently running. I mentioned at the time that it appeared possible to get the
source code file name and line number corresponding to the await points, but
left that for future work.
(This is a section of the lilos intro guide that people seemed to like, so
to increase its visibility, I’m lifting it up into its own post and expanding it
a bit. I hope this is a useful companion piece to the post on async
debugging I posted this morning.))
Some documentation of Rust async and await has presented it as a seamless
alternative to threads. Just sprinkle these keywords through your code and get
concurrency that scales better! I think this is very misleading. An async fn
is a different thing from a normal Rust fn, and you need to think about
different things to write correct code in each case.
This post presents a different way of looking at async that I think is more
useful, and less likely to lead to cancellation-related bugs.
I’m a big fan of Rust’s async feature, which lets you write explicit state
machines like straight-line code. One of the operating systems I maintain,
lilos, is almost entirely based on async, and I think it’s a killer
feature for embedded development.
async is also popular when writing webservers and other network services. My
colleagues at Oxide use it quite a bit. Watching them work has underscored one
of the current issues with async, however: the debugging story is not great.
In particular, answering the question “why isn’t my program currently doing
anything” is very hard.
I’ve been quietly tinkering on some tools to improve the situation since 2021,
and I’ve recently released a prototype debugger for lilos: lildb. lildb
can print await traces for uninstrumented lilos programs, which are like
stack traces, but for suspended futures. I wrote this to help me debug my own
programs, but I’m publishing it to try and move the discussion on async
debugging forward. To that end, this post will walk through what it does, how it
derives the information it uses, and areas where we could improve things.
One of the nice things about the Rust programming language is that it
makes it easier to write correct concurrent (e.g. threaded) programs – to the
degree that Rust’s slogan has been, at times, “fearless concurrency.”
But I’d like to tell you about the other side of Rust, which I think is
under-appreciated. Rust enables you to write programs that are not concurrent.
This feature is missing from most other languages, and is a source of much
complexity and bugs.
“But wait,” you might be saying, “of course I can write code that isn’t
concurrent in Java or Python or C!”
Can you, though? You can certainly write code that ignores concurrency, and
would malfunction if (say) used from multiple threads simultaneously. But that’s
not the same thing as writing code that isn’t concurrent – code that simply
can’t be used concurrently, by compiler guarantee.
In Rust, you can. Let’s look at why you can do it, and why it’s awesome.
One of the common complaints I hear from systems programmers who try Rust is
about mutexes, and specifically about the Rust Mutex API. The complaints
usually go something like this:
They don’t want the mutex to contain data, just a lock.
They don’t want to have to manage a “guard” value that unlocks the mutex on
drop – often, more specifically, they just want to call an unlock operation
because they feel like that’s more explicit.
These changes would make the Rust mutex API equivalent to the C/Posix mutex API.
In one case I’ve seen someone try to use Mutex<()> and trickery to fake it.
There’s a problem with this, though: these two aspects of Mutex’s design are
inextricably linked to one another, and to Rust’s broader safety guarantees –
changing either or both of them will open the door to subtle bugs and
corruption due to data races.
A C-style mutex API consisting of some bundle of implicitly guarded data, plus
lock and unlock functions, isn’t wise in Rust because it allows safe code to
easily commit errors that break memory safety and create data races.
Perhaps controversially, I’d argue that this is also true in C. It’s just more
obvious in Rust, because Rust rigorously distinguishes between the notion of
“safe” code that cannot commit such errors, and “unsafe” code that can commit
such errors if it wishes. C does not make this distinction, and as a result, any
code using a mutex in C can trivially produce serious, potentially exploitable,
In the rest of this post I’ll walk through a typical C mutex API, compare with a
typical Rust mutex API, and look at what happens if we change the Rust API to
resemble C in various ways.
Last week I gave a talk at the Open Source Firmware Conference about some of the
work I’m doing at Oxide Computer, entitled On Hubris and Humility. There is a
video of the talk if you’d like to
watch it in video form. It came out pretty alright!
The conference version of the talk has a constantly animated background that
makes the video hard for some people to watch. OSFC doesn’t appear to be
bothering with either captions or transcripts, so my friends who don’t hear as
well as I do (or just don’t want to turn their speakers on!) are kind of out of
And so, here’s a transcript with my slides inlined. The words may not exactly
match the audio because this is written from my speaker’s notes. And, yes, my
slides are all character art. The browser rendering is imperfect.
I’ve also written an epilogue at the end after the initial response to the talk.
Here’s another useful Rust pattern. Like the Typestate Pattern
before it, I wrote this because I haven’t seen the sort of obsessively nerdy
writeup that I wanted to read. And, as with the Typestate Pattern, I didn’t
invent this — I’m merely documenting and generalizing it.
In this series so far, we’ve taken a C program and converted it into a faster,
smaller, and reasonably robust Rust program. The Rust program is a recognizable
descendant of the C program, and that was deliberate: my goal was to compare and
contrast the two languages for optimized code.
In this bonus section, I’ll walk through how we’d write the program from scratch
in Rust. In particular, I’m going to rely on compiler auto-vectorization to
produce a program that is shorter, simpler, portable, and significantly
faster… and without any unsafe.
In part 3 we found that our use of uninitialized memory was a premature
optimization that didn’t actually improve performance. This left us with only
one remaining unsafe function, but, boy, is it a doozy.
In this part, I’ll begin the process of corralling its unsafe optimizations
into more clearly safe code, by replacing arbitrary pointer casting with a
In the first part of this tutorial we took an optimized C program and
translated it to an equivalent Rust program, complete with all the unsafe
weirdness of the original: uninitialized variables, pointer casting and
In this section, we’ll begin using Rust’s features to make the program
incrementally more robust, while keeping performance unchanged.
Specifically, we’ll begin by introducing references.
In this part of the series, we’ll take a grungy optimized C program and
translate it, fairly literally, into a grungy optimized unsafe Rust program.
It’ll get the same results, with the same performance, as the original.
And that’s great! But I want to know how things actually work, and those tools
put a lot of code between me and the machine.
In this post, I’ll show how to create a simple web graphics demo using none of
no libraries between our code and the platform. It’s the web equivalent of bare
The resulting WebAssembly module will be less than 300 bytes. That’s about
the same size as the previous paragraph.
The typestate pattern is an API design pattern that encodes information about
an object’s run-time state in its compile-time type. In particular, an API
using the typestate pattern will have:
Operations on an object (such as methods or functions) that are only available
when the object is in certain states,
A way of encoding these states at the type level, such that attempts to use
the operations in the wrong state fail to compile,
State transition operations (methods or functions) that change the
type-level state of objects in addition to, or instead of, changing run-time
dynamic state, such that the operations in the previous state are no longer
This is useful because:
It moves certain types of errors from run-time to compile-time, giving
programmers faster feedback.
It interacts nicely with IDEs, which can avoid suggesting operations that are
illegal in a certain state.
It can eliminate run-time checks, making code faster/smaller.
This pattern is so easy in Rust that it’s almost obvious, to the point that
you may have already written code that uses it, perhaps without realizing it.
Interestingly, it’s very difficult to implement in most other programming
languages — most of them fail to satisfy items number 2 and/or 3 above.
I haven’t seen a detailed examination of the nuances of this pattern, so here’s
Over the past few months, I’ve been rewriting it — in Rust.
This is an interesting test case for Rust, because we’re very much in C/C++’s
home court here: the demo runs on the bare metal, without an operating system,
and is very sensitive to both CPU timing and memory usage.
The results so far? The Rust implementation is simpler, shorter (in lines of
code), faster, and smaller (in bytes of Flash) than my heavily-optimized C++
version — and because it’s almost entirely safe code, several types of
bugs that I fought regularly, such as race conditions and dangling pointers, are
now caught by the compiler.
It’s fantastic. Read on for my notes on the process.
This is a position paper that I originally circulated inside the firmware
community at X. I’ve gotten requests for a public link, so I’ve cleaned it up
and posted it here. This is, obviously, my personal opinion. Please read the
whole thing before sending me angry emails.
tl;dr: C/C++ have enough design flaws, and the alternative tools are in good
enough shape, that I do not recommend using C/C++ for new development except in
extenuating circumstances. In situations where you actually need the power of
C/C++, use Rust instead. In other situations, you shouldn’t have been using
C/C++ anyway — use nearly anything else.
Now that Hubris has gotten some attention, people sometimes ask me if my
personal projects are powered by Hubris.
The answer is: no, in general, they are not. My personal projects use my other
operating system, lilos, which predates Hubris and takes a fundamentally
different approach. It has dramatically lower resource requirements and allows
more styles of concurrency.
LRtDW is a series of articles putting Rust features in context for low-level C
programmers who maybe don’t have a formal CS background — the sort of
people who work on firmware, game engines, OS kernels, and the like. Basically,
people like me.
I’ve added Rust to my toolbelt, and I hope to get you excited enough to do the