#hubris

Crash recovery in 256 bytes

(This continues my series of posts on the exhubris tools I’m building, to enable more people to use Hubris in their embedded systems.)

One of Hubris’s strongest features is its ability to handle crashes in drivers and other application logic. It leaves the specific crash handling behavior up to the application programmer through a mechanism called a supervisor.

In this post I’ll look at why I made this decision, how it works in practice, and walk through the exhubris supervisor reference implementation, minisuper. (Spoiler: it’s very small.)

Revisiting Hubris appconfigs

So in my day-job over at Oxide we’ve built this nice embedded operating system called Hubris. If you follow my blog, you’re probably aware of it.

I also build a lot of embedded electronics outside my day-job, and people sometimes ask me (often excitedly!) if they’re using Hubris.

The answer so far is “no.” This is for a variety of reasons, but probably the biggest: it’s actually quite difficult to use Hubris for anything if you don’t want your code to live in the Oxide Hubris repo!

I would like to fix this, to enable other teams to use Hubris without having to coordinate with Oxide (or even publish their source code!). I’m starting by trying to address the needs of a single friendly customer: me.

As of this week I have it working, in a set of tools I call exhubris. It’s not by any means done (or all that pleasant to use). I’m going to write some posts about it, to help me think through the design process, and (more importantly!) to solicit feedback from my readers on where they think things should go.

This first post starts with the part of Hubris most users encounter first: the application configuration file, or appconfig.

From Hubris To Bits

The embedded platform we’ve built for firmware at Oxide is called Hubris. It’s unusual for a microcontroller operating system, and probably the biggest thing that makes it unusual is its use of separately-compiled tasks.

Most firmware applications mash all of their task and OS code together in a common memory space, which is simple and efficient, but can lead to subtle bugs. Hubris instead places bits of code in their own isolated memory areas, where they can run (and crash!) separately. This requires that each bit be compiled as a standalone program.

The CPUs we target don’t have virtual memory, so each of these separate programs has to be laid out at a known place in the address space. This introduces some challenges, and has prevented us from “just” using an off-the-shelf build system.

This post will walk through the process of building a Hubris application from source, from the perspective of the build system, and examine some of these challenges and how we addressed them.

The server chose violence

I’m continuing to reflect on the past four years with Hubris — April Fool’s Day was, appropriately enough, the fourth anniversary of the first Hubris user program, and today is the fourth anniversary of the first kernel code. (I wrote the user program first to help me understand what the kernel’s API wanted to look like.)

Of all of Hubris’s design decisions, there’s one that gets a “wait what” response more often than any other. It’s also proving to be a critical part of the system’s overall robustness. In this post, I’ll take a look at our 13th and oddest syscall, REPLY_FAULT.

Who killed the network switch?

We found a neat bug in Hubris this week. Like many bugs, it wasn’t a bug when it was originally written — correct code became a bug as other things changed around it.

I thought the bug itself, and the process of finding and fixing it, provided an interesting window into our development process around Hubris. It’s very rare for us to find a bug in the Hubris kernel, mostly because it’s so small. So I jumped at the opportunity to write this one down.

This is a tale of how two features, each useful on its own, can combine to become a bug. Read on for details.

On Hubris And Humility

Last week I gave a talk at the Open Source Firmware Conference about some of the work I’m doing at Oxide Computer, entitled On Hubris and Humility. There is a video of the talk if you’d like to watch it in video form. It came out pretty alright!

The conference version of the talk has a constantly animated background that makes the video hard for some people to watch. OSFC doesn’t appear to be bothering with either captions or transcripts, so my friends who don’t hear as well as I do (or just don’t want to turn their speakers on!) are kind of out of luck.

And so, here’s a transcript with my slides inlined. The words may not exactly match the audio because this is written from my speaker’s notes. And, yes, my slides are all character art. The browser rendering is imperfect.

I’ve also written an epilogue at the end after the initial response to the talk.