You Can't Write C in Just Any Ol' Language
Learn Rust the Dangerous Way, Part 1
In this part of the series, we’ll take a grungy optimized C program and translate it, fairly literally, into a grungy optimized unsafe Rust program. It’ll get the same results, with the same performance, as the original.
Our first program
To begin, we need a C program that is…
- Fairly short,
- Self-contained,
- Aggressively optimized using some C tricks,
- But not a complete toy.
I’ve chosen a solar system simulator (nbody-gcc-8) from the Benchmarks Game. The Game is a nice source of small programs that have been heavily optimized. This particular program uses SSE and memory layout/locality optimizations to crunch numbers really fast.
This is the sort of thing that C is awesome for. On the leaderboard for this task at the time of this writing, C-like languages1 dominate, with the next fastest language — Go — coming in fully 3.5x slower.
For this purpose, I’m counting languages without garbage collectors that give you significant control over memory locality and whatnot, so C, C++, Fortran, Ada, and of course Rust. All these languages have versions of this program that perform within 33% of the C version.
Just in case the Benchmarks Game URLs change, I’ve included a local copy:
- nbody.gcc-8.c - (license)
We’ll be going over that source code with a fine-toothed comb, and it might be useful to keep a copy open in a separate tab or editor. I will occasionally reformat the C code to fit better in a narrow column.
Initial translation to ugly Rust
Let’s begin by working through the C program and writing a Rust program that does the same thing. To understand what “the same thing” is, we’ll need to think carefully about what the C program is expressing. You might be surprised about which lines of code take most of our time!
(In terms from human languages, what we’re doing is more of a transliteration or gloss than a true translation, because the result won’t be fluent Rust. We’ll fix that later.)
Prelude
The very top of the C file is book-keeping, which we can translate quickly as a
warmup. Rust imports much of the standard library by default; we’ll bring in
additional memory management tools, SIMD intrinsics, and the constant PI
.
C:
Rust:
use mem;
use *;
use PI;
Types
The C program defines two types, intnative_t
and the struct body
. By
studying the program, we can see that intnative_t
is used only as a loop
index; to skip to the punchline, it winds up not being used at all in Rust,
where it gets replaced by usize
. struct body
, on the other hand, is used
all over.
C:
// intptr_t should be the native integer type on most
// sane systems.
typedef intptr_t intnative_t;
typedef structbody;
Rust:
// Note 1
// Note 2
That’s pretty straightforward, but we’ve already got two notes.
Note 1: What’s with that allow
nonsense?
Rust, like a lot of recent languages, has some opinions about naming
conventions. You’ll get lint warnings about things like lowercase struct
names. In the interest of not changing everything at once, this allow
attribute (similar to a #pragma
) informs the compiler that we wish to allow
deviations from the naming conventions.
Note 2: The #[repr(C)]
marker, which applies to the struct
that follows,
asks Rust to lay out the struct
exactly like C. Without this, Rust assumes
you don’t really care about the ordering of fields in a struct
, and it will
optimize them for best packing and alignment — something you’ve probably
done by hand before. It actually turns out that this program doesn’t assume a
particular field order, but we’ll turn off field reordering for now.
Data
The next portion of the program lays out data describing the solar system. This is an area where Rust and C are pretty similar, so I’ll use it as an opportunity to talk about the principles behind the places they differ. The overall theme is one of being explicit.
C:
static body solar_Bodies=;
Rust:
// Note 1
const SOLAR_MASS: f64 = 4. * PI * PI;
const DAYS_PER_YEAR: f64 = 365.24;
const BODIES_COUNT: usize = 5;
// ,------------------------------------- Note 2
// | ,--------- Note 3
// v v
static mut solar_Bodies: = ;
Note 1: These #define
s can be replaced by Rust const
, which declares a
constant. Constants have types, which avoids some embarrassing mistakes (don’t
ask me how I know this).
Note 2: In general, data declared in C can be overwritten unless
specifically marked const
. Rust takes the opposite approach: data cannot be
overwritten unless marked mut
(for “mutable”). (This is nice, because a
compiler typically can’t detect when you’re mutating something by accident in
C, because you forgot to const
it.)
Note 3: C uses an implied array size here. The most direct translation
into Rust requires that we state the array size. Since there are a number of
places where the program would become wrong if the size of this array deviated
from BODIES_COUNT
, I feel like this is acceptable.
Note 4: Writing a struct
literal in C uses naked curly braces { ... }
,
and the type must be inferred from the context. In Rust, you must state the type
explicitly: body { ... }
.
Note 5: In C, you can write a struct
literal and leave out some fields
— though every C shop I’ve worked at has discouraged this. You need to
provide some initializer for fields in Rust2.
Note 6: In C (after C99) you can optionally give the field names in a
struct
literal; our program does this for the Sun, but not the planets. In
Rust, you must always give the field names. (This means that struct
literals
don’t change meaning if you reorder the fields in a struct
!)
A common reason to leave out fields is when you only care about part
of a struct
, and don’t want to clutter your code with a long string of
zeros. Don’t worry, there is a way to do this in Rust. I’m not introducing it
here because I think it would be a distraction, but if you would like a
distraction, it’s called struct update syntax.
Our first function: offset_Momentum
From here on out we’ll be hitting the functions out-of-order, because it’s convenient for me, the author. The order of functions in a file doesn’t matter in Rust, so we can translate the program in the original order, or in the order I’m using.
Let’s start with the shortest function, offset_Momentum
. For a short function,
it’s going to produce a lot of commentary.
C:
static void
Rust:
// v------------------------------------ Note 1
unsafe
Note 1: the Rust version of this function is marked unsafe
. While that
sounds like a scary word, in our case, it simply means that we’re opting in to
the ability to arbitrary (and possibly dumb) things with pointers, and that
anyone calling us is risking corruption if they pass in bogus arguments. That’s
the same deal the C program offered.
Note 2: the C program takes an argument body bodies[]
. While this looks
like an array, it’s really just a pointer — it’s equivalent to body *bodies
due to array decay. As it’s non-const
, the Rust equivalent is *mut body
.
Note 3: a C for
loop that simply walks a single index through a range of
numbers — the common case — becomes a loop over a range in Rust.
It generates the same code, but is easier (for me) to read, because I don’t need
to hunt for “clever” index manipulation in the clauses of the loop.
Note 4: This is the elephant in the room: why did the C expression
bodies[0]
explode into *bodies.add(0)
?
Let me begin by assuring you that this is temporary. We’ll clean it up in the next part of this series.
Now: I did this because, in the C program,
- We’re using raw (C-style) pointers,
- We’re doing arithmetic on them (which is what indexing is in C), and
- We’re not checking array bounds or
NULL
, so what we’re doing might be wrong, if our caller has given us bogus arguments.
Rust won’t stop us from doing this, but it won’t go out of its way to help us either. In general, given a dangerous option and an equally good safe option, Rust will try to nudge you toward the safe option by making it easier to use. In keeping with the theme of being explicit, doing dangerous stuff requires more typing, so it’s harder to do by accident.
In C, bodies[i]
is exactly the same as *(bodies + i)
— it performs
pointer arithmetic and a dereference and nothing else. In particular, it assumes
that you have some reason to know that i
is a valid index for the array. This
is the common case in C and gets a shorthand using square brackets.
In Rust, the common case uses references instead of pointers3 and does
bounds-checking, and so that use case gets the shorthand bodies[i]
. Doing
unchecked pointer arithmetic is the exception rather than the rule, and so we
have to express it more deliberately.
Finally, we have to write *bodies.add(i)
instead of *(bodies + i)
because
Rust doesn’t overload arithmetic operators for pointers. Pointers instead
provide add
, sub
, and other operations that are easier to spot in code
review; here’s the full list if you’re curious.
To the machine, references are pointers, but to the compiler and programmer they have slightly different meanings. We’ll take a closer look at this in the next part of this series.
Note 5: Given all that, you might be surprised to see velocity[m]
using
square brackets. I chose to simplify this because I can see that it’s
equivalent:
velocity
is a 3-element inline array within thebody
struct.- Our indices are
0..3
(meaning 0, 1, 2), so every access is in bounds.
If we tried to read off the end of the velocity
array using square brackets,
the program would panic at run time, which would be a behavior change. Since we
don’t do this…we don’t have to worry about it.
Technically, writing velocity[m]
is asking the compiler to generate a bounds
check. However, rustc
is quite good at deducing when the bounds checks are
unnecessary, and simple loops with constant indices are the easiest case. We’ll
see more of this pattern later on.
Second: output_Energy
For the most part, output_Energy
applies the same techniques used in
offset_Momentum
, and I’ll present them without further commentary. The notes
below call attention to things we haven’t seen before.
C:
static void
Rust:
unsafe
Note 1: We’ve hit some implicit complexity in the C program.
It declares a local variable like so:
double position_Delta;
That’s an array of three double
s, sure — but the important part is
what’s missing: an initializer. C leaves local variables uninitialized unless
you give them an initializer. In this case, the elements of position_Delta
contain arbitrary, unpredictable values. That’s fine, because the program
immediately replaces them with good values in the following loop.
Why is this important? Sometimes leaving memory uninitialized is important for performance, so I’m using this as an opportunity to demonstrate the technique you’ll use. (In this particular case, I measured it and it doesn’t seem to affect performance.)
Uninitialized variables are a source of serious bugs in C, and many other languages, you can’t even have them. You can have uninitialized variables in Rust, but you have to type more. The equivalent line is:
let mut position_Delta = ;
// ...which is shorthand for...
let mut position_Delta: =
;
(That ::<f64>
syntax is called a “turbofish” if you haven’t seen it before.
This is the only place where I’ll use it.)
The standard library provides std::mem::MaybeUninit
for expressing storage
locations that might be uninitialized. A MaybeUninit<T>
is a T
that might
not be initialized. MaybeUninit::uninit()
is an uninitialized value. So this
line declares an array of three f64
s containing arbitrary uninitialized
memory, just like in C — we just have to write out the words to assure the
compiler that, no, really, this is what we wanted.
Accessing these variables is unsafe
, but that shouldn’t scare you — it
just means you need to be real sure that the variables are initialized before
you read them. The first thing the Rust code does, like C, is to write the array
elements with valid data. Writing the array elements is more verbose than C:
position_Delta.as_mut_ptr.write;
What?
MaybeUninit::as_mut_ptr()
produces a mutable pointer to the
possibly-uninitialized memory. write
is another raw pointer
operation, which writes data into the pointed-to memory without
reading it first.
Why would writing to memory involve reading it first? In Rust, it’s usually
because the type you’re storing has a destructor. f64
doesn’t, but I’m
trying to be pedantically correct here in case you copy this code into another
context.
Note 2: it’s our first transmute
! Here’s the line so you don’t have to
scroll up:
let position_Delta: = transmute;
std::mem::transmute
is a very versatile function in Rust. It converts between
any two types as long as they’re the same size, without running any code or
changing any data — it simply reinterprets the bits of one type as the
other. This is similar to a cast in C, but it’s more predictable and more
powerful — technically it’s the equivalent of doing something like
// we're in C here
float x = ;
int y = * &x;
Because transmute
will happily convert (say) a floating point number into a
pointer, or any three-word struct
into any other, it is firmly, gloriously
unsafe
. And incredibly useful.
Here, we’re transmuting from our original [MaybeUninit<f64>; 3]
(see Note 1)
to [f64; 3]
— we’re shedding the MaybeUninit
, because we, the
programmer, are convinced that the array is fully initialized at this point.
This means we can access the array elements freely in the rest of the function.
As long as we’ve done the initialization correctly, this use of transmute
is
safe.
Note that we’re consuming the old position_Delta
array on the right, and
creating a new position_Delta
variable on the left with the same name. This
is a common Rust idiom for changing the type (or mutability) of a variable.
Coming from a C background, this kind of “name shadowing” may seem scary to you,
and that’s fine — you don’t have to use it. But you’ll definitely
encounter it in Rust code.
Note 3: This one’s comparatively easy. Where C has sqrt
and sqrtf
, Rust
has f64::sqrt
and f32::sqrt
.
Third: advance
advance
is the actual simulation. It steps the solar system forward by a tiny
amount. Compared to the previous two functions we’ve visited, advance
is
pretty long, so I’ll present it in sections.
The original advance
function is heavily commented; I’ve removed those
comments for space reasons.
Section one: setup
This one’s dense enough that I’ll present the two languages separately.
C:
static void
Skipping right on by those #define
s, which we’ve done before, we arrive at
some subtle stuff! For the first time I’ve marked notes on the C side.
C Note 1: This is a variable inside a function with a static
qualifier.
This gives the name the same scope as a local, but rather than being stored on
the stack, a single copy is placed in global memory. This has a bunch of
implications.
-
A
static
variable missing an initializer is not uninitialized, the way a local is. It gets zeroed instead. -
It gets zeroed once, so this function doesn’t pay to initialize it each time it gets called. This can improve performance the same way an uninitialized local can, which is almost certainly why this program is doing it this way.
-
Of course, because it’s zeroed once, it’s only reliably zero the first time this function is called. Next time it will contain leftover results from a previous call.
-
As a side effect, the function is neither reentrant nor thread-safe, because having multiple invocations of this function fighting over the
static
storage at the same time will produce weird results.
C Note 2: The alignas
specifier, new in C11, overrides the natural
alignment of something with the alignment of something else. Here, we’ve
requested the alignment of __m128d
, which is Intel’s name for an SSE vector
type containing two double
s. This winds up aligning to a multiple of 16 bytes.
When encountering this sort of specifier, there are two questions we need to ask:
-
Why is the alignment being overridden? In this case, it’s because we’ll be accessing these variables as both
double
s and vectors, depending on the point in the program.4 -
What is being aligned? It would be easy to read
alignas(__m128d) double
as adouble
aligned to 16 bytes, but that’s not correct. Thealignas
specifier applies to the array being declared. So here we have a normal array ofdouble
s, but aligned specially.
In my personal opinion, it’s simpler, clearer, and less error-prone to
do this with a union
, but I didn’t write this.
Rust:
unsafe
Starting at Note 1, a mysterious struct
has appeared.
Rust has an alignment attribute that behaves like C’s, but it can only be
applied to types, not values. Instead of overriding the alignment on some
double
, in Rust, we create a wrapper called something like AlignedDouble
. In
this case, because I’m not super imaginitive, I’ve called it Align16
.
Align16
contains an array of ROUNDED_INTERACTIONS_COUNT
f64
s and aligns it
to a 16 byte boundary. I’ve written Align16
as a tuple struct with unnamed
fields, but you could write it as a normal struct
if you prefer.
Align16
is marked with the align
attribute, but it’s also marked as
#[derive(Copy, Clone)]
. This is a bit of boilerplate that says that we want
the type to be freely copyable, like an f64
. We need this to write the array
initializers at Note 2.
Speaking of Note 2, here we initialize our static
s using array literals.
The static
s are explicitly marked mut
because we’re going to modify them.
As in C, the mut
part renders advance
non-reentrant and non-thread-safe. And
Rust really doesn’t like that, because Rust provides guarantees about data
races (namely, that they don’t exist), and having a static mut
like this lets
us violate that guarantee. But we’re not writing safe Rust, so it’s our
responsibility to implement thread safety ourselves — and in this program,
we do so by not having threads.
Moving on!
Section two: calculating distances
The C code is using a more complex for
loop this time. I’ve un-indented the
code, remember that we’re inside a function.
C:
// v--------------------------------- Note 1
for
for // Note 2
for
position_Deltas=
bodies.position-bodies.position;
Rust:
Read the C closely around the two Note comments. k
is declared and initialized
at Note 1, so it’s in scope throughout the three nested loops, just like
i
. But k
is not incremented there — it’s incremented at Note 2.
That is, k
is counting the number of times we go through the j
loop.
C’s for
loop is a subtle tool, and it’s easy to express things like this in C
(though we could debate how readable the result is). There’s an idiomatic way
to do things like this in Rust, too, but it would require the introduction of
some higher-level concepts that aren’t appropriate just yet.
Instead, I’ve taken the C loop apart and just created a k
variable. The outer
curly braces limit the scope of k
, just as they would in C. With that
simplification, the loops become simple loops over ranges, and (in my opinion)
it’s clearer what, exactly, k
is counting.
This is often, but not always, the right way to translate complex C for
loops — at least during your first pass.
Finally, two brief notes on Rust syntax:
-
If this is your first time encountering syntax like
position_Deltas[m].0[k]
, the.0
is accessing the first (only) unnamed field in the tuple structAlign16
, which we defined in the last section. -
You’ve probably noticed that my Rust loops have closing curly braces even when the C didn’t. While that happens to be my personal preference in C, in this case, it isn’t personal: Rust requires it.
Section three: converting to vectors
Before I present the code: if you haven’t written MMX or SSE code in C before,
let me just note that __m128d
is Intel’s (awkward) type name for a 128-bit
vector of two doubles, and MMX/SSE operations appear as functions prefixed with
_mm_
.
Onward: we are entering the loop that computes the gravitational interactions of each pair of bodies in the solar system.
C:
for
Rust:
for i in 0..ROUNDED_INTERACTIONS_COUNT/2
This is where we start punning the arrays of doubles as arrays of SSE vectors, so that we can read them and operate on them two-at-a-time.
The beginning and end, with position_Delta
, are the same “uninitialized array”
pattern we used earlier. The program creates an uninitialized array of
__m128d
and then fills it in by copying corresponding elements from
position_Deltas
, loading pairs of double
/f64
as __m128d
. To do this in
C, we cast the array (which decays to a pointer) to an __m128d *
and index
into the results.
In Rust, we do the same thing. Rust casts primitive types using as
. Note that
there is one more cast in Rust than in C (Note 1). This is because the
expression &x
in Rust takes the address of x
as a reference, but we want a
raw pointer — so we cast first to a pointer, and then to a pointer of a
different type. We’ll cover references in more detail in the next part of this
series.
(I’m using *const
in Rust even though C is using mutable pointers, because
while C uses non-const
pointers, it doesn’t actually mutate anything here.
That is, the lack of const
is probably a mistake, though a harmless one, in
the C program.)
Section four: vector math
(I’m un-indenting the code here to fit more in this column, so just remember that we’re in the middle of a loop still.)
C:
const __m128d distance_Squared=
position_Delta*position_Delta+
position_Delta*position_Delta+
position_Delta*position_Delta;
__m128d distance_Reciprocal=
;
for
distance_Reciprocal=distance_Reciprocal*1.5-
0.5*distance_Squared*distance_Reciprocal*
;
=
0.01/distance_Squared*distance_Reciprocal;
Rust:
let distance_Squared: __m128d = _mm_add_pd;
let mut distance_Reciprocal: __m128d =
_mm_cvtps_pd;
for _ in 0..2
.add
.write;
Well that certainly got longer! The Rust program contains a lot more calls to
Intel MMX/SSE intrinsics (starting with _mm_
). In fact, pretty much the only
line that directly corresponds between the two programs is the one that
computes the initial value of distance_Reciprocal
.
There’s a reason for that: the C program is cheating.
Okay, that’s too harsh — what the C program is doing is relying on a
non-standard language extension that provides operator overloading for certain
types, including __m128d
.
At the time I’m writing this, Rust doesn’t provide operator overloading for
__m128d
in the standard library, so we use the vector intrinsics explicitly.
(Unlike with C, I could actually add operator overloading myself with a bit
more code, but I think that’d be distracting.)
Section five: finishing up
That finishes the body of the biggest loop; let’s close it out and finish the function. There are no new techniques required here, so I’ll present this without commentary.
C:
} // end of loop
for
for
for
for
bodies.position+=0.01*bodies.velocity;
Rust:
} // end of loop
for i in 0..BODIES_COUNT
And that’s the end of advance
. Whew! It’s all downhill from here!
Finally: main
main
simply calls the functions we’ve already seen.
C:
int
Rust:
These two implementations are pretty similar, but there are differences.
Note 1: Remember that solar_Bodies
is our static mut
array of body
structs, defined above. In C, we pass it to functions using just its name,
solar_Bodies
. This works because, in C, we can’t pass arrays by value, so
using the name of an array as a function argument causes decay to a pointer.
In C,
int data;
;
is equivalent to
int data;
;
This is not true in Rust. If we pass solar_Bodies
to a function, we’re
passing, by value, an array of BODIES_COUNT
body
structs. That’s not what
we want.
Like in C, the ampersand in Rust can be used to take the address of a variable,
but &solar_Bodies
produces a reference instead of a pointer. We’d need to
insert a cast to a pointer.
However, this situation is pretty common, and there’s a shorthand:
array.as_mut_ptr()
returns a mut
pointer to the first element of array
.
We use this at each call to pass solar_Bodies
to the functions that operate on
it.
Note 2: Both programs parse the first command line argument as an integer
and use that to determine the number of times to advance
. The way the two
programs do it is different.
In C, we use:
In Rust, we use the much longer expression
.nth.unwrap
args .parse.unwrap
This is another case of implicit complexity in C becoming explicit in Rust, and it all comes down to failure behavior.
What does the C program do when no command line arguments are provided?
argv[1]
is then an access past the end of the argv
array, so it reads
arbitrary memory as a char *
. It then passes the arbitrary pointer to atoi
,
which reads the memory it points to and tries to parse it as an integer. In
practice, on Linux at least, this segfaults.
What does the C program do when an argument is provided, but it isn’t an
integer? atoi
doesn’t detect errors, so non-integer arguments are treated as
0.
The Rust program contains unwrap()
calls at each of these points, which is a
Rust idiom that converts an error condition into an immediate panic. We could
choose a different response, like substituting invalid arguments with a default
value, but for this program, I haven’t bothered.
This is a case where I’ve introduced memory safety where the original program
had none — why now? Well, for cross-platform compatibility reasons5,
Rust wraps command line arguments behind an abstraction that makes it pretty
hard to read off the end of argv
. Rather than beating my head against it, I
just did the Rusty thing.
I’m guessing this has something to do with Windows?
Evaluating the translated program
Completed source code for phase one.
To build the C version:
$ time gcc -O3 -fomit-frame-pointer -march=native -funroll-loops -static \
nbody.gcc-8.c -o nbody.gcc-8 -lm
real 0m0.294s
user 0m0.264s
sys 0m0.030s
Building the Rust version:
$ time rustc -C opt-level=3 -C target-cpu=native -C codegen-units=1 \
nbody-1.rs -o nbody-1
real 0m0.369s
user 0m0.310s
sys 0m0.048s
rustc
is compiling a bit slower than gcc
here, but they’re both fast enough
that we’re probably mostly seeing startup time. For a more even comparison, I’ll
include the results building with clang
, which shares a lot of code with
rustc
.
$ time clang -O3 -fomit-frame-pointer -march=native -funroll-loops -static \
1/nbody.gcc-8.c -o nbody.clang-8.bench -lm
real 0m0.418s
user 0m0.343s
sys 0m0.058s
Let’s get the golden output from the C version to make sure our translation was correct:
$ ./nbody.gcc-8 50000000
-0.169075164
-0.169059907
And our translation:
$ ./nbody-1 50000000
-0.169075164
-0.169059907
The output matches (which mostly means we didn’t mess up the order of the
floating point operations). We can compare the performance using
hyperfine
. In this case, as in all performance measurements
presented in this series, I’m on an Intel i7-8550U with the performance
governor and Turbo Boost off.
Command | Mean [s] | Min [s] | Max [s] | Ratio |
---|---|---|---|---|
./nbody.gcc-8.bench 50000000 | 6.199 ± 0.006 | 6.185 | 6.206 | 1.17x |
./nbody.clang-8.bench 50000000 | 5.277 ± 0.007 | 5.258 | 5.282 | 1.00x |
./nbody-1.bench 50000000 | 5.123 ± 0.024 | 5.095 | 5.161 | 0.97x |
Our initial grungy Rust program is roughly the same speed as the C program when
compiled by clang
. Because gcc
is so much slower, I’ll use clang
as my
point of comparison in the rest of this series.
Next Steps
We’ve taken a C program and produced an essentially equivalent Rust program, right down to the pointer casting and uninitialized variables. The result is a program that neither a Rust nor a C programmer would love, but it makes an important point: that we can express the same tricky stuff in Rust that we can in C, just with different syntax. Rust wants to help you stay safe, but unlike a lot of “safe” languages, it will get out of your way when required.
The two programs get the same results in roughly the same amount of time. In Part 2, I’ll begin cleaning up the code, and discuss the advantages of using safe Rust’s features to produce more robust code, and the performance cost of safety. (Spoiler: we’re not going to get any slower.)