I’m continuing to reflect on the past four years with Hubris — April Fool’s
Day was, appropriately enough, the fourth anniversary of the first Hubris user
program, and today is the fourth anniversary of the first kernel code. (I wrote
the user program first to help me understand what the kernel’s API wanted to
look like.)
Of all of Hubris’s design decisions, there’s one that gets a “wait what”
response more often than any other. It’s also proving to be a critical part of
the system’s overall robustness. In this post, I’ll take a look at our 13th and
oddest syscall, REPLY_FAULT.
We found a neat bug in Hubris this week. Like many bugs, it wasn’t a bug when
it was originally written — correct code became a bug as other things
changed around it.
I thought the bug itself, and the process of finding and fixing it, provided an
interesting window into our development process around Hubris. It’s very rare
for us to find a bug in the Hubris kernel, mostly because it’s so small. So I
jumped at the opportunity to write this one down.
This is a tale of how two features, each useful on its own, can combine to
become a bug. Read on for details.
This is a position paper that I originally circulated inside the firmware
community at X. I’ve gotten requests for a public link, so I’ve cleaned it up
and posted it here. This is, obviously, my personal opinion. Please read the
whole thing before sending me angry emails.
tl;dr: C/C++ have enough design flaws, and the alternative tools are in good
enough shape, that I do not recommend using C/C++ for new development except in
extenuating circumstances. In situations where you actually need the power of
C/C++, use Rust instead. In other situations, you shouldn’t have been using
C/C++ anyway — use nearly anything else.
A webserver is a computer, connected to the public internet, that does things
(serves pages, etc.) whenever anyone asks it to. This makes it an easy thing
to attack: the first step toward attacking a computer is usually getting it to
do your bidding, and a webserver does your bidding every time you click a link.
My system logs show that I get attacked several times a day, like (I imagine)
most computers on the Internet. Fortunately, most attacks bounce off — not
because I have some magic security-foo, but rather because the software I’m
using — specifically publicfile — doesn’t work the way the attackers
expect it to.
While I am not so naive or foolish as to say that my server is “secure” —
I’m sure it has some exploitable hole, and it runs in a distant facility that
probably forgets to lock the doors sometimes — these attacks are of mostly
academic interest.
Here’s some data I’ve collected from the past month or so of attacks. I figure
this might help someone else detect or prevent an attack in the future.
Jason Ansel,
Petr Marchenko,
Ulfar Ericsson,
Elijah Taylor,
Brad Chen,
Derek L. Schuff,
David Sehr,
Cliff L. Biffle and
Bennet Yee
2012-03-14
This paper, presented at PLDI ’11, describes a key innovation behind Native
Client, which is (as far as I’m aware) an industry first: the ability to verify
the safety of a code-generating program, like a JIT or language runtime, and
that of its output, on the fly.
We can even support self-modifying code, with very little runtime overhead for
verification. I firmly believe that active runtimes involving some degree of
JIT code generation are the future, and this paper shows that we don’t have to
sacrifice security or reliability to support them.
I designed the mechanisms behind this technology with Bennet Yee and David Sehr,
for x86, x86-64, and ARM processors. The rest of the authors did the hard part:
implementing it in a portable way and shipping it to the masses. If you’re
using Chrome, you’re already using this technology.
We received an internal Google award for this paper.
David Sehr,
Robert Muth,
Cliff L. Biffle,
Victor Khimenko,
Egor Pasko,
Bennet Yee,
Karl Schimpf and
Brad Chen
2010-08-11
Software Fault Isolation (SFI) is an effective approach to sandboxing binary
code of questionable provenance, an interesting use case for native plugins in
a Web browser. We present software fault isolation schemes for ARM and x86-64
that provide control-flow and memory integrity with average performance
overhead of under 5% on ARM and 7% on x86-64. We believe these are the best
known SFI implementations for these architectures, with significantly lower
overhead than previous systems for similar architectures. Our experience
suggests that these SFI implementations benefit from instruction-level
parallelism, and have particularly small impact for workloads that are data
memory-bound, both properties that tend to reduce the impact of our SFI
systems for future CPU implementations.