The server chose violence
Hubris's oddest syscall
- A brief overview of Hubris IPC
- New and exciting failure modes
- None of this happens in normal, correct programs
- The kernel is not having any of your nonsense.
- The server isn’t having any of your nonsense, either.
- The joy of panicking other programs
I’m continuing to reflect on the past four years with Hubris — April Fool’s Day was, appropriately enough, the fourth anniversary of the first Hubris user program, and today is the fourth anniversary of the first kernel code. (I wrote the user program first to help me understand what the kernel’s API wanted to look like.)
Of all of Hubris’s design decisions, there’s one that gets a “wait what”
response more often than any other. It’s also proving to be a critical part of
the system’s overall robustness. In this post, I’ll take a look at our 13th and
oddest syscall, REPLY_FAULT
.
A brief overview of Hubris IPC
Hubris uses a small, application-independent kernel, and puts most of the code — drivers, application logic, network stack, etc. — in separately compiled isolated tasks. These tasks can communicate with each other using a cross-task messaging system (inter-process communication, or IPC). (This section will do a sort of “Hubris in a nutshell” — if you’d like to learn more I recommend the Reference Manual.)
IPC in Hubris consists of three core operations, implemented in the kernel, which tasks can request using syscalls:
RECV
collects the highest priority incoming message, or blocks until one arrives.SEND
stops the caller and transfers a message — and control! — to the receiving task. The caller is parked until it gets a response.REPLY
delivers a response to a task that had previously usedSEND
, allowing it to continue.
The Hubris IPC scheme is deliberately designed to work a lot like a function call, at least from the perspective of the client.
We often talk about “clients” and “servers” in Hubris, and it’s worth noting
that these are roles tasks play. A client is just a task using SEND
, and a
server is a task using RECV
and REPLY
– but they’re not mutually exclusive.
A task may be a server to some other tasks, and simultaneously a client to
different tasks. For instance, an “LED Blinker” task may call (client) into a
“GPIO driver” task (server), which itself may call (client) into a supervisory
task (server).
To underscore this point, here’s a graph of the IPC flow (green arrows) between tasks (rectangles) in Oxide’s production server firmware. Notice that almost all tasks have arrows both coming out (client) and coming in (server).
New and exciting failure modes
When writing a function or procedure in almost any programming language, you
make some assumptions about your callers’ behavior. This creates preconditions
for calling the function. Depending on the language, some are explicit, and some
are implicit. In Rust, for instance, if your function takes an argument of
type String
, it’s reasonable to assume your caller passes in a String
and
not a bool
.
Your function has the backing of the compiler here: the caller has to pass a compatible type for all arguments, or the compiler won’t let them attempt to call the function. It’s possible to subvert this if you work at it, of course, but it’s hard to subvert it by accident.
The compiler and linker conspire behind the scenes to make sure that your
program calls the function you intended. This ensures that you won’t be
surprised by code that attempts to call pet_cat
and winds up calling
fire_missiles
instead, except in very rare circumstances.
Because IPC crosses task boundaries, and tasks in Hubris are separately compiled programs, you have to be careful making these same assumptions with IPC. If a client is compiled against the wrong interface, or confuses one task for another, the compiler won’t have any idea, since it sees only a single program at a time. In this respect, IPC acts more like communication over a network.
Every task on Hubris that acts as an IPC server has to deal with the following potential errors:
- Getting a message with an operation code that isn’t even appropriate for your interface, like “operation number 48” in a two-operation interface.
- Receiving an uninterpretable bag of bytes instead of the message type you were expecting — or a message that is much too short or long.
- Not getting the sort of loaned memory you require (e.g. you need it writable but you receive it read-only, or don’t receive it at all).
But I describe those as potential errors because, in practice…
None of this happens in normal, correct programs
In a normal Hubris program, none of these things happen.
Tasks are connected to each other by configuration in the build system, so it’s hard to confuse one for the other. Clients use generated Rust code to construct and send IPCs to servers, which use different generated Rust code to handle the result. This lets us squint and pretend that the type system works across task boundaries — it doesn’t, really, but our tools produce a pretty good illusion.
I always hate to penalize the “good” programs for error cases that they can’t actually hit. All of the obvious ways of handling the potential but unlikely errors (described above) hurt good programs.
For example: making all IPC operations return a Result<T, IpcError>
where the
good programs can’t actually hit any case in IpcError
means that, in practice,
they’ll just unwrap()
it. That’s a fairly large operation in terms of code
size — especially when we know the code will never be used! — and costs time
at runtime to check for errors that won’t happen.
To keep every client from needing to unwrap()
a bazillion errors, we could put
the unwrap()
(or more generally a panic!
) into the generated code. This
might reduce the code size impact (by centralizing the panic!
in one location)
but won’t reduce the cost at runtime.
There’s also a different kind of cost: a design cost. To be able to return a universal error from any operation, and have it be understood by a caller attempting any other operation, we have to make rules about the message encoding. Every operation must be capable of returning an error, every operation must have a way of encoding this particular error, and the encoding of this error by all operations must be identical.
This means you can’t express an operation that can’t fail, which is particularly annoying: as we’ve built our firmware infrastructure on Hubris, we keep finding operations that really can’t fail. Setting a GPIO pin, for example.
So we dearly needed an alternative to this “universal error code” approach. I drew inspiration from a weird design decision I made in the Hubris kernel API: the Hubris kernel is unusually aggressive.
The kernel is not having any of your nonsense.
In most operating systems, if you violate the preconditions for a system call, you get a polite error code back from the kernel — or, at worst, an exception handler or signal handler gets triggered. You have an opportunity to recover.
Take Unix for example. If you call close
on a file descriptor you never
opened, you get an error code back. If you call open
and hand it a null
pointer instead of a pathname? You get an error code back. Both of these are
violations of a system call’s preconditions, and both are handled through the
same error mechanism that handles “file not found” and other cases that can
happen in a correct program.
On Hubris, if you break a system call’s preconditions, your task is immediately destroyed with no opportunity to do anything else.1
The application can choose to do something about it, because when any task takes a fault, the application’s supervisor task is notified. Typically the supervisor responds by wiping the task and restarting it. But the task has no opportunity to do anything else.
More specifically, the kernel delivers a synthetic fault. This is very similar to the hardware faults that a task receives if it, say, dereferences a null pointer, or divides by zero. Those are produced by the CPU for breaking the processor architecture’s rules. Synthetic faults, on the other hand, are produced by the kernel for breaking the kernel’s rules.
For example, when a task calls SEND
, it passes the kernel the index of the
intended recipient task, and a pointer to some memory containing the message. If
the recipient task index is out of range for the application? Synthetic fault.
If the message pointer points to memory the task doesn’t actually have access
to? Synthetic fault.
Early in the system’s design, I decided not to permit recoverable/resumable faults. That is, when a program takes a fault — whether it’s hardware or synthetic — the task is dead. It can run no further instructions. There is no way to “fix” the problem and resume the task. This was a conscious choice to avoid some subtle failure modes and simplify reasoning about the system.2
As I mentioned in the previous footnote, the application supervisor task can decide to leave the task dead, or to wipe it and restart it. I also decided to support one supervisor, not a supervisor tree like Erlang’s. This prevents a malicious pair of tasks from cooperating to reset one another by making one supervise the other. If you want the supervisor to participate in your scheme, you’ll have to exploit the central well-tested supervisor task.
But combined with the kernel’s habit of faulting any task that looks at it funny, this makes the system’s behavior very unusual compared to most operating systems.
And it’s been great.
Initially I was concerned that I’d made the kernel too aggressive, but in practice, this has meant that errors are caught very early in development. A fault is hard to miss, and literally cannot be ignored the way an error code might be. Humility (our debugger) happily prints a detailed description of any fault it finds; in fact, one made an appearance in my last Hubris-related post, although in that case it was being reported in error:
mem fault (precise: 0x801bffd) in syscall (was: wait: reply from i2c_driver/gen0)
This is a synthetic fault that a task receives for handing the kernel a pointer
to some memory (at address 0x801bffd
in this case) that the task can’t
actually access.
This behavior was so nice to use in practice, in fact, that it suggested a way to fix our IPC error reporting woes: generalize the same mechanism.
The server isn’t having any of your nonsense, either.
Once I realized that our unusually strict kernel was actually helping
developers instead of hindering them, I was inspired to implement Hubris’s 13th
and oddest syscall: REPLY_FAULT
.
I mentioned REPLY
earlier, the mechanism servers use to respond to their
clients. More specifically,
-
When a client uses
SEND
the kernel marks the client’s task as “waiting to send” to the recipient task. -
When the recipient uses
RECV
, one client task “waiting to send” to it is updated to “waiting for reply.” The client task will remain in that state until something changes — usually, the server usingREPLY
. -
REPLY
only works on a task marked as “waiting for reply” from the specific server task that is attempting to reply. It switches the client task back into a “runnable” state.
REPLY_FAULT
is basically the same thing, except instead of delivering a
message and making the task runnable, it delivers a fault and makes the task
dead. With REPLY_FAULT
, we can avoid having unnecessary error handling on IPC
operations, because correct programs will just go on as if the problem can’t
occur — and incorrect programs won’t get to handle the error at all!
Like REPLY
, a server can only REPLY_FAULT
a task that is waiting for its
reply. You can’t use REPLY_FAULT
to kill random tasks, only the set of tasks
from which you have RECV
’d a message and not yet REPLY
’d.
Our system now uses REPLY_FAULT
to handle the three cases I mentioned earlier:
a bogus operation code; or a corrupt, truncated, or otherwise nonsensical
message; or if the client doesn’t send the right kind of loaned memory.
But REPLY_FAULT
also provides a way to define and implement new kinds of
errors — application-specific errors — such as access control rules. For
instance, the Hubris IP stack assigns IP ports to tasks statically. If a task
tries to mess with another task’s IP port, the IP stack faults them. This gets
us the same sort of “fail fast” developer experience, with the smaller and
simpler code that results from not handling “theoretical” errors that can’t
occur in practice.
Like the kernel’s aggressive handling of errors in system calls, I was initially
concerned that REPLY_FAULT
would be too extreme. After I had the idea, I
delayed several months before starting implementation, basically trying to talk
myself out of it.
I was being too careful. REPLY_FAULT
has been great. A new developer on the
system recently cited it as one of the more surprising and pleasant parts of
developing on Hubris, which is what inspired me to write this post.
The joy of panicking other programs
I mentioned earlier that Hubris IPC was explicitly designed to behave like a Rust function call from the perspective of the client.
Well, if you violate the preconditions on a Rust function call, the function
will normally respond with a panic!
.
REPLY_FAULT
essentially provides a way for servers to generate cross-process
panic!
in their clients, without requiring clients to contain code to do it
— or, perhaps more importantly, without requiring clients to cooperate in the
process at all.
Overall, this combines with some other system features to make Hubris “aggressively hostile to malicious programs,” as Eliza Weissman recently described it. Since attempts at exploitation often manifest first as errors or misuse of APIs, a system that responds to any misbehavior by wiping the state of the misbehaving component ought to be harder to exploit. (This hypothesis has yet to be tested! Please reach out to me if you’re interested in trying to exploit Hubris. I will help!)
In practice, the only downside I’ve observed from these decisions is that the
system is really difficult to fuzz test. Because I like chaos
engineering, I’ve implemented a small “chaos” task that generates random
IPCs and system calls to test other components for bugs, and almost anything it
does causes it to get immediately reset. To be useful, it has to base all of its
decisions off the one piece of state that is observably different each time it
starts: the system uptime counter. (However, REPLY_FAULT
does provide a way
for servers to force chaos upon their clients by randomly killing them, an
option I haven’t fully evaluated.)
But normal Hubris tasks don’t dynamically generate IPC messages, particularly
ones that are deliberately bogus. In practice, they can carry on without
realizing REPLY_FAULT
even exists — because unless they do something really
unusual, they will never see the business end of it anyway.