#security

The server chose violence

I’m continuing to reflect on the past four years with Hubris — April Fool’s Day was, appropriately enough, the fourth anniversary of the first Hubris user program, and today is the fourth anniversary of the first kernel code. (I wrote the user program first to help me understand what the kernel’s API wanted to look like.)

Of all of Hubris’s design decisions, there’s one that gets a “wait what” response more often than any other. It’s also proving to be a critical part of the system’s overall robustness. In this post, I’ll take a look at our 13th and oddest syscall, REPLY_FAULT.

Who killed the network switch?

We found a neat bug in Hubris this week. Like many bugs, it wasn’t a bug when it was originally written — correct code became a bug as other things changed around it.

I thought the bug itself, and the process of finding and fixing it, provided an interesting window into our development process around Hubris. It’s very rare for us to find a bug in the Hubris kernel, mostly because it’s so small. So I jumped at the opportunity to write this one down.

This is a tale of how two features, each useful on its own, can combine to become a bug. Read on for details.

Prefer Rust to C/C++ for new code.

This is a position paper that I originally circulated inside the firmware community at X. I’ve gotten requests for a public link, so I’ve cleaned it up and posted it here. This is, obviously, my personal opinion. Please read the whole thing before sending me angry emails.

tl;dr: C/C++ have enough design flaws, and the alternative tools are in good enough shape, that I do not recommend using C/C++ for new development except in extenuating circumstances. In situations where you actually need the power of C/C++, use Rust instead. In other situations, you shouldn’t have been using C/C++ anyway — use nearly anything else.

Attacks on my Server: The Data

I try to maintain a reasonably secure webserver.

A webserver is a computer, connected to the public internet, that does things (serves pages, etc.) whenever anyone asks it to. This makes it an easy thing to attack: the first step toward attacking a computer is usually getting it to do your bidding, and a webserver does your bidding every time you click a link.

My system logs show that I get attacked several times a day, like (I imagine) most computers on the Internet. Fortunately, most attacks bounce off — not because I have some magic security-foo, but rather because the software I’m using — specifically publicfile — doesn’t work the way the attackers expect it to.

While I am not so naive or foolish as to say that my server is “secure” — I’m sure it has some exploitable hole, and it runs in a distant facility that probably forgets to lock the doors sometimes — these attacks are of mostly academic interest.

Here’s some data I’ve collected from the past month or so of attacks. I figure this might help someone else detect or prevent an attack in the future.

Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code

This paper, presented at PLDI ’11, describes a key innovation behind Native Client, which is (as far as I’m aware) an industry first: the ability to verify the safety of a code-generating program, like a JIT or language runtime, and that of its output, on the fly.

We can even support self-modifying code, with very little runtime overhead for verification. I firmly believe that active runtimes involving some degree of JIT code generation are the future, and this paper shows that we don’t have to sacrifice security or reliability to support them.

I designed the mechanisms behind this technology with Bennet Yee and David Sehr, for x86, x86-64, and ARM processors. The rest of the authors did the hard part: implementing it in a portable way and shipping it to the masses. If you’re using Chrome, you’re already using this technology.

We received an internal Google award for this paper.

Adapting Software Fault Isolation to Contemporary CPU Architectures

Software Fault Isolation (SFI) is an effective approach to sandboxing binary code of questionable provenance, an interesting use case for native plugins in a Web browser. We present software fault isolation schemes for ARM and x86-64 that provide control-flow and memory integrity with average performance overhead of under 5% on ARM and 7% on x86-64. We believe these are the best known SFI implementations for these architectures, with significantly lower overhead than previous systems for similar architectures. Our experience suggests that these SFI implementations benefit from instruction-level parallelism, and have particularly small impact for workloads that are data memory-bound, both properties that tend to reduce the impact of our SFI systems for future CPU implementations.

Presented at the 19th USENIX Security Symposium.