The server chose violence

Hubris's oddest syscall

2024-04-08

I’m continuing to reflect on the past four years with Hubris — April Fool’s Day was, appropriately enough, the fourth anniversary of the first Hubris user program, and today is the fourth anniversary of the first kernel code. (I wrote the user program first to help me understand what the kernel’s API wanted to look like.)

Of all of Hubris’s design decisions, there’s one that gets a “wait what” response more often than any other. It’s also proving to be a critical part of the system’s overall robustness. In this post, I’ll take a look at our 13th and oddest syscall, REPLY_FAULT.

A brief overview of Hubris IPC

Hubris uses a small, application-independent kernel, and puts most of the code — drivers, application logic, network stack, etc. — in separately compiled isolated tasks. These tasks can communicate with each other using a cross-task messaging system (inter-process communication, or IPC). (This section will do a sort of “Hubris in a nutshell” — if you’d like to learn more I recommend the Reference Manual.)

IPC in Hubris consists of three core operations, implemented in the kernel, which tasks can request using syscalls:

RECV collects the highest priority incoming message, or blocks until one arrives.
SEND stops the caller and transfers a message — and control! — to the receiving task. The caller is parked until it gets a response.
REPLY delivers a response to a task that had previously used SEND, allowing it to continue.

The Hubris IPC scheme is deliberately designed to work a lot like a function call, at least from the perspective of the client.

We often talk about “clients” and “servers” in Hubris, and it’s worth noting that these are roles tasks play. A client is just a task using SEND, and a server is a task using RECV and REPLY – but they’re not mutually exclusive. A task may be a server to some other tasks, and simultaneously a client to different tasks. For instance, an “LED Blinker” task may call (client) into a “GPIO driver” task (server), which itself may call (client) into a supervisory task (server).

To underscore this point, here’s a graph of the IPC flow (green arrows) between tasks (rectangles) in Oxide’s production server firmware. Notice that almost all tasks have arrows both coming out (client) and coming in (server).

A directed graph showing layers of tasks in our firmware with edges drawn between them, which is unfortunately difficult to explain entirely in text.

New and exciting failure modes

When writing a function or procedure in almost any programming language, you make some assumptions about your callers’ behavior. This creates preconditions for calling the function. Depending on the language, some are explicit, and some are implicit. In Rust, for instance, if your function takes an argument of type String, it’s reasonable to assume your caller passes in a String and not a bool.

Your function has the backing of the compiler here: the caller has to pass a compatible type for all arguments, or the compiler won’t let them attempt to call the function. It’s possible to subvert this if you work at it, of course, but it’s hard to subvert it by accident.

The compiler and linker conspire behind the scenes to make sure that your program calls the function you intended. This ensures that you won’t be surprised by code that attempts to call pet_cat and winds up calling fire_missiles instead, except in very rare circumstances.

Because IPC crosses task boundaries, and tasks in Hubris are separately compiled programs, you have to be careful making these same assumptions with IPC. If a client is compiled against the wrong interface, or confuses one task for another, the compiler won’t have any idea, since it sees only a single program at a time. In this respect, IPC acts more like communication over a network.

Every task on Hubris that acts as an IPC server has to deal with the following potential errors:

Getting a message with an operation code that isn’t even appropriate for your interface, like “operation number 48” in a two-operation interface.
Receiving an uninterpretable bag of bytes instead of the message type you were expecting — or a message that is much too short or long.
Not getting the sort of loaned memory you require (e.g. you need it writable but you receive it read-only, or don’t receive it at all).

But I describe those as potential errors because, in practice…

None of this happens in normal, correct programs

In a normal Hubris program, none of these things happen.

Tasks are connected to each other by configuration in the build system, so it’s hard to confuse one for the other. Clients use generated Rust code to construct and send IPCs to servers, which use different generated Rust code to handle the result. This lets us squint and pretend that the type system works across task boundaries — it doesn’t, really, but our tools produce a pretty good illusion.

I always hate to penalize the “good” programs for error cases that they can’t actually hit. All of the obvious ways of handling the potential but unlikely errors (described above) hurt good programs.

For example: making all IPC operations return a Result<T, IpcError> where the good programs can’t actually hit any case in IpcError means that, in practice, they’ll just unwrap() it. That’s a fairly large operation in terms of code size — especially when we know the code will never be used! — and costs time at runtime to check for errors that won’t happen.

To keep every client from needing to unwrap() a bazillion errors, we could put the unwrap() (or more generally a panic!) into the generated code. This might reduce the code size impact (by centralizing the panic! in one location) but won’t reduce the cost at runtime.

There’s also a different kind of cost: a design cost. To be able to return a universal error from any operation, and have it be understood by a caller attempting any other operation, we have to make rules about the message encoding. Every operation must be capable of returning an error, every operation must have a way of encoding this particular error, and the encoding of this error by all operations must be identical.

This means you can’t express an operation that can’t fail, which is particularly annoying: as we’ve built our firmware infrastructure on Hubris, we keep finding operations that really can’t fail. Setting a GPIO pin, for example.

So we dearly needed an alternative to this “universal error code” approach. I drew inspiration from a weird design decision I made in the Hubris kernel API: the Hubris kernel is unusually aggressive.

The kernel is not having any of your nonsense.

In most operating systems, if you violate the preconditions for a system call, you get a polite error code back from the kernel — or, at worst, an exception handler or signal handler gets triggered. You have an opportunity to recover.

Take Unix for example. If you call close on a file descriptor you never opened, you get an error code back. If you call open and hand it a null pointer instead of a pathname? You get an error code back. Both of these are violations of a system call’s preconditions, and both are handled through the same error mechanism that handles “file not found” and other cases that can happen in a correct program.

On Hubris, if you break a system call’s preconditions, your task is immediately destroyed with no opportunity to do anything else.¹

The application can choose to do something about it, because when any task takes a fault, the application’s supervisor task is notified. Typically the supervisor responds by wiping the task and restarting it. But the task has no opportunity to do anything else.

More specifically, the kernel delivers a synthetic fault. This is very similar to the hardware faults that a task receives if it, say, dereferences a null pointer, or divides by zero. Those are produced by the CPU for breaking the processor architecture’s rules. Synthetic faults, on the other hand, are produced by the kernel for breaking the kernel’s rules.

For example, when a task calls SEND, it passes the kernel the index of the intended recipient task, and a pointer to some memory containing the message. If the recipient task index is out of range for the application? Synthetic fault. If the message pointer points to memory the task doesn’t actually have access to? Synthetic fault.

Early in the system’s design, I decided not to permit recoverable/resumable faults. That is, when a program takes a fault — whether it’s hardware or synthetic — the task is dead. It can run no further instructions. There is no way to “fix” the problem and resume the task. This was a conscious choice to avoid some subtle failure modes and simplify reasoning about the system.²

As I mentioned in the previous footnote, the application supervisor task can decide to leave the task dead, or to wipe it and restart it. I also decided to support one supervisor, not a supervisor tree like Erlang’s. This prevents a malicious pair of tasks from cooperating to reset one another by making one supervise the other. If you want the supervisor to participate in your scheme, you’ll have to exploit the central well-tested supervisor task.

But combined with the kernel’s habit of faulting any task that looks at it funny, this makes the system’s behavior very unusual compared to most operating systems.

And it’s been great.

Initially I was concerned that I’d made the kernel too aggressive, but in practice, this has meant that errors are caught very early in development. A fault is hard to miss, and literally cannot be ignored the way an error code might be. Humility (our debugger) happily prints a detailed description of any fault it finds; in fact, one made an appearance in my last Hubris-related post, although in that case it was being reported in error:

mem fault (precise: 0x801bffd) in syscall (was: wait: reply from i2c_driver/gen0)

This is a synthetic fault that a task receives for handing the kernel a pointer to some memory (at address 0x801bffd in this case) that the task can’t actually access.

This behavior was so nice to use in practice, in fact, that it suggested a way to fix our IPC error reporting woes: generalize the same mechanism.

The server isn’t having any of your nonsense, either.

Once I realized that our unusually strict kernel was actually helping developers instead of hindering them, I was inspired to implement Hubris’s 13th and oddest syscall: REPLY_FAULT.

I mentioned REPLY earlier, the mechanism servers use to respond to their clients. More specifically,

When a client uses SEND the kernel marks the client’s task as “waiting to send” to the recipient task.
When the recipient uses RECV, one client task “waiting to send” to it is updated to “waiting for reply.” The client task will remain in that state until something changes — usually, the server using REPLY.
REPLY only works on a task marked as “waiting for reply” from the specific server task that is attempting to reply. It switches the client task back into a “runnable” state.

REPLY_FAULT is basically the same thing, except instead of delivering a message and making the task runnable, it delivers a fault and makes the task dead. With REPLY_FAULT, we can avoid having unnecessary error handling on IPC operations, because correct programs will just go on as if the problem can’t occur — and incorrect programs won’t get to handle the error at all!

Like REPLY, a server can only REPLY_FAULT a task that is waiting for its reply. You can’t use REPLY_FAULT to kill random tasks, only the set of tasks from which you have RECV’d a message and not yet REPLY’d.

Our system now uses REPLY_FAULT to handle the three cases I mentioned earlier: a bogus operation code; or a corrupt, truncated, or otherwise nonsensical message; or if the client doesn’t send the right kind of loaned memory.

But REPLY_FAULT also provides a way to define and implement new kinds of errors — application-specific errors — such as access control rules. For instance, the Hubris IP stack assigns IP ports to tasks statically. If a task tries to mess with another task’s IP port, the IP stack faults them. This gets us the same sort of “fail fast” developer experience, with the smaller and simpler code that results from not handling “theoretical” errors that can’t occur in practice.

Like the kernel’s aggressive handling of errors in system calls, I was initially concerned that REPLY_FAULT would be too extreme. After I had the idea, I delayed several months before starting implementation, basically trying to talk myself out of it.

I was being too careful. REPLY_FAULT has been great. A new developer on the system recently cited it as one of the more surprising and pleasant parts of developing on Hubris, which is what inspired me to write this post.

The joy of panicking other programs

I mentioned earlier that Hubris IPC was explicitly designed to behave like a Rust function call from the perspective of the client.

Well, if you violate the preconditions on a Rust function call, the function will normally respond with a panic!.

REPLY_FAULT essentially provides a way for servers to generate cross-process panic! in their clients, without requiring clients to contain code to do it — or, perhaps more importantly, without requiring clients to cooperate in the process at all.

Overall, this combines with some other system features to make Hubris “aggressively hostile to malicious programs,” as Eliza Weissman recently described it. Since attempts at exploitation often manifest first as errors or misuse of APIs, a system that responds to any misbehavior by wiping the state of the misbehaving component ought to be harder to exploit. (This hypothesis has yet to be tested! Please reach out to me if you’re interested in trying to exploit Hubris. I will help!)

In practice, the only downside I’ve observed from these decisions is that the system is really difficult to fuzz test. Because I like chaos engineering, I’ve implemented a small “chaos” task that generates random IPCs and system calls to test other components for bugs, and almost anything it does causes it to get immediately reset. To be useful, it has to base all of its decisions off the one piece of state that is observably different each time it starts: the system uptime counter. (However, REPLY_FAULT does provide a way for servers to force chaos upon their clients by randomly killing them, an option I haven’t fully evaluated.)

But normal Hubris tasks don’t dynamically generate IPC messages, particularly ones that are deliberately bogus. In practice, they can carry on without realizing REPLY_FAULT even exists — because unless they do something really unusual, they will never see the business end of it anyway.

#api-design #dayjob #embedded #rust #security

Cliffle