Crash recovery in 256 bytes
The exhubris
supervisor.
- The role of the supervisor in Hubris
- Aside: what happens without a supervisor?
- A minimal supervisor
- How reinitialize_task prevents pile-ups
- Who supervises the supervisor?
- The cost of supervision
- Conclusions
(This continues my series of posts on the exhubris
tools I’m building, to
enable more people to use Hubris in their embedded systems.)
One of Hubris’s strongest features is its ability to handle crashes in drivers and other application logic. It leaves the specific crash handling behavior up to the application programmer through a mechanism called a supervisor.
In this post I’ll look at why I made this decision, how it works in practice,
and walk through the exhubris
supervisor reference implementation,
minisuper
. (Spoiler: it’s very small.)
The role of the supervisor in Hubris
Bugs happen, and programs crash. That’s more or less the single-sentence motivation for Hubris. But what happens when a program crashes?
On your phone or a desktop computer, you probably get a popup warning saying that an application had stopped. You can launch it again if you want. This approach requires human interaction — to read the error message, and to restart the program (or not) as desired.
Servers in datacenters usually can’t expect a human operator to click dialog boxes, so programs on these machines are usually restarted automatically by some sort of service framework. Usually there will be tools to control how often the program can be restarted, or if there are circumstances where we want the system to give up and stop trying.
Hubris is designed for systems that don’t require human interaction, and may not even have any sort of interface. The tiny microcontrollers inside a keyboard or power adapter need to just keep working without asking for help. So, in such systems, what should we do if a task crashes?
We could immediately restart it, but if we’re in a situation where a task may endlessly crash on startup, that could be very expensive, preventing the machine from meeting its other responsibilities.
We could leave it be, but what will cause it to start back up again? Does the user have to turn us off and back on again?
We could introduce some sort of backoff strategy, waiting longer periods of time each time a task crashes. But how do we cap the wait? If the wait begins at one second and doubles every time, after ten crashes we’re waiting 17 minutes, which is an awfully long time for a toaster to become unresponsive!
My conclusion is that there is no right answer to this question. The correct action to take when a program crashes depends on the context and the application’s requirements. So, Hubris leaves it up to you, the programmer.
Specifically, the decision is left up to the supervisor task. All Hubris applications must include one task playing the role of supervisor. The supervisor will be informed about any crashes, and have the opportunity to take any corrective action it likes.
When a task in a Hubris application crashes — whether it’s because it abused a
syscall, dereferenced a wild pointer, or explicitly called panic!
— the
kernel does only three things.
-
It records information about the crash (called a fault) in the task’s control block in kernel memory.
-
It posts a notification to the supervisor task.
-
It invokes the scheduler to choose a different task to run, since the current task is dead. This almost always chooses the supervisor task1.
That’s it. The kernel does not alter the crashed task’s other state in any way. It also does not immediately unblock any other tasks that are trying to send messages to the crashed task (more on this later).
This is because the supervisor task is normally the highest priority task in the system, able to preempt anything else, and normally hangs out waiting for the notification to arrive. While this is probably the behavior you want, you could choose to do things differently. I’m not the cops.
Aside: what happens without a supervisor?
You could choose to build an application that doesn’t contain a supervisor task. I don’t recommend doing this, but if you did, here’s what would happen.
When any task in your application crashes, the kernel will post notification 0 to task 0 (the first task in the config file) as if it were a supervisor. Your task, which is not a supervisor, would get this spurious notification and perhaps just ignore it, going on about its day.
The crashed task, which I’ll call Task A, will be left in a Faulted
state
unable to run.
Now, what if Task A responds to messages from other tasks? If a client (Task B)
tries to send to A while A is Faulted
, B will block (as in any case where A is
unavailable) … and has no way to unblock.
In fact, if A crashes while processing a message from B, B will also sit blocked, waiting for the reply… which will never come.
It’s not unusual for a Hubris application to have several layers of tasks exchanging messages. If some task C goes to send to B while B is waiting for A (which is dead), C is now blocked too.
This will continue as a sort of “pile-up” with more and more tasks waiting (transitively) for a message that will never arrive. This is why it’s important to include at least a minimal supervisor implementation in your application if there’s any risk of a task crashing. (And I think it’s always best to assume that the risk exists!) Hubris’s design assumes the supervisor exists, and a lot of fundamental operations (like inter-task messaging) have defined behaviors that just don’t make sense without it.
A minimal supervisor
Just because your Hubris application should include a supervisor doesn’t mean
you have to write a supervisor. The exhubris
repo contains a minimal
supervisor (called minisuper
) that you can use to get started — or use
forever, if it meets your needs.
Let’s take it apart!
I’ll be discussing the code at a specific commit (cae47f138
) in the
exhubris
repo if you want to follow along.
minisuper
lives at the path task/minisuper
in the repository. In that
directory are three files:
task/minisuper
├── Cargo.toml
├── README.mkdn
└── src
└── main.rs
This is the simplest possible project layout for a Rust executable, and there’s
nothing all that interesting in Cargo.toml
, so the rest of this section will
focus on task/minisuper/src/main.rs
. Here’s the entire program, with some
comments removed because we’re going to walk through it in detail below.
const FAULT_NOTIFICATION: u32 = 1;
!
The notification response loop
minisuper
, like all Hubris tasks, is written as an infinite loop. Once
started, it runs until it crashes (which it shouldn’t).
At the top of the loop, minisuper
calls the function sys_recv_notification
,
passing it a bitmask with bit 0 set. This is a specialized wrapper for the more
general RECV
syscall: sys_recv_notification
blocks the current task until
a notification arrives whose corresponding bit in the mask is set. So, this call
blocks waiting for this task’s notification 0.
The Hubris kernel-to-supervisor interface specifies that notification 0 is the
one the kernel will post on a crash, so, this causes minisuper
to wait for a
crash.
Technically, minisuper
might also wake without any tasks crashing. This is
because other tasks can use the POST
syscall to manually set the supervisor’s
notification 0 to pending. As a result, the loop is designed to gracefully
handle the case where no tasks have actually crashed, by going back to sleep in
sys_recv_notification
.
(sys_recv_notification
returns a bitmask indicating which notifications caused
the task to wake up. We ignore it, since we only set one bit in the mask — we
know if we wake that it’s notification 0!)
Kernel IPCs (KIPCs)
Supervisors need to be able to act on other tasks — to read out information about the crash, and to restart them. These are not powers that other tasks should have! But they require help from the kernel to actually implement.
Most Hubris kernel operations are exposed as system calls (like the RECV
that sys_recv_notification
used above). System calls are available to all
tasks.
If the supervisor-specific actions were implemented as system calls, we’d now have a set of system calls that are only available to some tasks. They would need to check the identity of the caller, and treat calls from any task other than the supervisor as errors. Those checks would need to consistently happen in every protected syscall, and not in others. This seemed unfortunate, and I wanted to avoid it.
The Hubris system call interface is also relatively stable. We rarely add new things and have never removed anything. The supervisor interface, on the other hand, has been regularly adjusted over the years, as we learn from our own systems.
These two properties led me to expose supervisor-specific operations through a
different mechanism, kernel IPCs or KIPCs. From the supervisor’s perspective,
a KIPC looks just like using SEND
to send a message to another task, but
instead of a valid task ID, the message is sent to a special ID reserved for the
kernel.
The kernel handles these messages internally, giving it a convenient single place to screen out tasks other than the supervisor. KIPCs can also send and return more complex serialized Rust types than the simpler system call ABI can.
minisuper
uses two KIPCs, specifically: find_faulted_task
and
reinitialize_task
.
find_faulted_task
asks the kernel to scan its task table starting at a given
index, returning the index of the next task in the Fault
state, if there is a
task in the Fault
state. This KIPC lets minisuper
scan for faulted tasks
efficiently; in the usual case of a single faulted task, only two KIPCs are
required, one to find it and a second one to find that there are no others.
reinitialize_task
resets a task to its default state and optionally sets it
running. (The alternative is to leave it stopped.) “Default state” here means
that the CPU registers are reloaded to their initial values, including moving
the program counter back to the task’s entry point, and resetting the task’s
stack pointer. (It does not clear the task’s RAM; tasks are responsible for
doing this themselves.) reinitialize_task
has some other important effects
that I’ll save for a section below.
With that explanation of the KIPC operations out of the way, let’s look at what
minisuper
is doing with them.
Crash handling policy
Because minisuper
’s code has probably scrolled out of view by now, let me
repeat the crash handling code here:
let mut next_task = 1;
while let Some = find_faulted_task
The search starts at task index 1; this is because the supervisor is task index 0, and if the supervisor has faulted, this code is no longer running!
Each time a faulted task is found, minisuper
calls reinitialize_task
to
rewind it to starting state, and passes NewState::Runnable
to ask the kernel
to start it. It then resumes the search starting from the next task, until all
faulted tasks have been reinitialized.
In practice, this restarts one task at a time:
- The supervisor is the highest priority task in an application, so other tasks can’t preempt the supervisor.
- Therefore, if any other task crashes, it happens while
minisuper
has yielded the CPU withsys_recv_notification
. - The supervisor is immediately notified on any crash, which causes
minisuper
to wake and enter this loop. - Until
minisuper
yields again at the top of the loop, no further crashes can occur, so this loop only sees one crash in practice.
It is possible to construct a weird application where this property is not true, but don’t.
From stepping through this code, it’s clear that minisuper
implements a policy
of immediately reinitializing and running any task that crashes.
How reinitialize_task
prevents pile-ups
I left two loose ends above:
-
I mentioned that, without a supervisor, task pile-ups can occur, but never explained how they can be fixed.
-
I also mentioned that
reinitialize_task
had some additional behavior that I wasn’t explaining yet.
These both lead to the same explanation!
The reinitialize_task
operation is very carefully designed to not leave any
task behind. In addition to restarting the target task, it may also have effects
on other tasks:
-
Any tasks that are waiting in
SEND
with a message to the restarted task is broken out of its wait with an error code. (Often, tasks will just try to send again to the new incarnation of the task, but that’s up to them!) -
If the crashed task was already processing one or more messages from other tasks — that is, the other tasks are blocked waiting for the reply — those other tasks are broken out of their waits with an error code. (This looks just like the first case to the sending tasks.)
-
If any tasks are waiting in a closed
RECV
to the crashed task, they are broken out of their wait, too. (ClosedRECV
is a little-used Hubris syscall feature that I won’t explain here; if you’d like to know more, there’s a section on closedRECV
in the reference manual.)
This all happens when the task is restarted, rather than when the task initially crashes. This is because the other tasks are somewhat likely to retry their sends; delaying that until the crashed task has been restarted means the sends will succeed right away, without hitting the crashed task and failing. This also makes it easier to view the state of all the tasks at the time of the crash, through a debugger or other tools.
Thus, when minisuper
restarts a crashed task using reinitialize_task
, it
also ensures that all the tasks that depend on the crashed task are left in a
valid state, and are able to make progress.
Who supervises the supervisor?
If the Hubris kernel notifies the supervisor when a task crashes, what happens if the supervisor crashes?
The answer: nothing. Not right away, anyway.
If the supervisor crashes, the kernel leaves it be. Any other tasks that crash will not be restarted, resulting in a potential pile-up. This is bad.
Ideally, the supervisor will not crash, and supervisor tasks should be written to avoid crashing. If the supervisor needs to do something complex where crashing is a possibility, however, you’ll want to use a hardware watchdog timer to ensure the system can restart if the supervisor dies. (How exactly to do this depends on the microcontroller model, and is out of scope for this post.)
The exhubris
libraries provide a feature that can help you write a supervisor
that cannot crash: the no-panic
feature on userlib
. All tasks depend on
userlib
; if they add the no-panic
feature, any panics that are not optimized
away by the compiler will cause a link failure with an error message.
The error message is currently not very good; it looks something like:
rust-lld: error: undefined symbol: you_have_introduced_a_panic_but_this_task_has_disabled_panics
>>> referenced by lib.rs:565 (sys/userlib/src/lib.rs:565)
>>> /home/cbiffle/proj/exhubris/play/.work/demo/build/super:(rust_begin_unwind)
This provides a way to ensure, at compile time, that a task cannot panic. If the
task is written in safe Rust, then it also should not be able to crash due to
memory protection violations, illegal instructions, and the like. This makes
this feature really helpful when writing a supervisor, and if you check
task/minisuper/Cargo.toml
, you’ll see that the feature is on:
[]
= { = true, = ["no-panic"]}
This is not a guarantee against bugs, only crashes. The code can still do wrong things, enter an infinite loop, etc. But it can’t crash, at least.
The cost of supervision
The fact that Hubris requires a supervisor task means that most applications have one more task than they would otherwise need. Since embedded applications are usually space-constrained (in either RAM or flash), is this an impractical cost?
Adding minisuper
to an application typically requires…
- About 32 bytes of flash for the in-kernel task descriptor.
- About 100 bytes of RAM for the in-kernel task control block.
- 256 bytes of flash for the compiled
minisuper
program. - 128 bytes of RAM for its stack. (256 bytes on a part with an FPU.)
This is pretty small. I haven’t had any trouble fitting minisuper
into even
the smallest of supported microcontrollers.
Conclusions
With any luck, you now understand the role of the supervisor task in a Hubris application, and have a sense of what you would need to do to write your own. (Try it!)
The supervisor task is only one of several places where Hubris puts the application programmer in the driver’s seat, instead of hardcoding behavior into the kernel. I may touch on a few others (like peripheral and memory sharing) in future posts.
I feel that this design decision has held up well, and having a very compact
reference implementation (minisuper
) shows that crash recovery doesn’t have to
be expensive and complex to be useful. The same “no questions asked”
restart-on-crash policy is what we use in Oxide’s production firmware (in
Oxide’s supervisor implementation), and it’s been working well.