You Can't Write C in Just Any Ol' Language

Learn Rust the Dangerous Way, Part 1

2019-11-24

In this part of the series, we’ll take a grungy optimized C program and translate it, fairly literally, into a grungy optimized unsafe Rust program. It’ll get the same results, with the same performance, as the original.

Our first program

To begin, we need a C program that is…

Fairly short,
Self-contained,
Aggressively optimized using some C tricks,
But not a complete toy.

I’ve chosen a solar system simulator (nbody-gcc-8) from the Benchmarks Game. The Game is a nice source of small programs that have been heavily optimized. This particular program uses SSE and memory layout/locality optimizations to crunch numbers really fast.

This is the sort of thing that C is awesome for. On the leaderboard for this task at the time of this writing, C-like languages¹ dominate, with the next fastest language — Go — coming in fully 3.5x slower.

For this purpose, I’m counting languages without garbage collectors that give you significant control over memory locality and whatnot, so C, C++, Fortran, Ada, and of course Rust. All these languages have versions of this program that perform within 33% of the C version.

Just in case the Benchmarks Game URLs change, I’ve included a local copy:

nbody.gcc-8.c - (license)

We’ll be going over that source code with a fine-toothed comb, and it might be useful to keep a copy open in a separate tab or editor. I will occasionally reformat the C code to fit better in a narrow column.

Initial translation to ugly Rust

Let’s begin by working through the C program and writing a Rust program that does the same thing. To understand what “the same thing” is, we’ll need to think carefully about what the C program is expressing. You might be surprised about which lines of code take most of our time!

(In terms from human languages, what we’re doing is more of a transliteration or gloss than a true translation, because the result won’t be fluent Rust. We’ll fix that later.)

Prelude

The very top of the C file is book-keeping, which we can translate quickly as a warmup. Rust imports much of the standard library by default; we’ll bring in additional memory management tools, SIMD intrinsics, and the constant PI.

#include <stdint.h>
#include <stdalign.h>
#include <immintrin.h>
#include <math.h>
#include <stdio.h>

Rust:

use std::mem;
use std::arch::x86_64::*;
use std::f64::consts::PI;

Types

The C program defines two types, intnative_t and the struct body. By studying the program, we can see that intnative_t is used only as a loop index; to skip to the punchline, it winds up not being used at all in Rust, where it gets replaced by usize. struct body, on the other hand, is used all over.

// intptr_t should be the native integer type on most 
// sane systems.
typedef intptr_t intnative_t;

typedef struct{
    double position[3], velocity[3], mass;
} body;

Rust:

// Note 1
#![allow(
    non_upper_case_globals,
    non_camel_case_types,
    non_snake_case,
)]

#[repr(C)]  // Note 2
struct body {
    position: [f64; 3],
    velocity: [f64; 3],
    mass: f64,
}

That’s pretty straightforward, but we’ve already got two notes.

Note 1: What’s with that allow nonsense?

Rust, like a lot of recent languages, has some opinions about naming conventions. You’ll get lint warnings about things like lowercase struct names. In the interest of not changing everything at once, this allow attribute (similar to a #pragma) informs the compiler that we wish to allow deviations from the naming conventions.

Note 2: The #[repr(C)] marker, which applies to the struct that follows, asks Rust to lay out the struct exactly like C. Without this, Rust assumes you don’t really care about the ordering of fields in a struct, and it will optimize them for best packing and alignment — something you’ve probably done by hand before. It actually turns out that this program doesn’t assume a particular field order, but we’ll turn off field reordering for now.

Data

The next portion of the program lays out data describing the solar system. This is an area where Rust and C are pretty similar, so I’ll use it as an opportunity to talk about the principles behind the places they differ. The overall theme is one of being explicit.


#define SOLAR_MASS (4*M_PI*M_PI)
#define DAYS_PER_YEAR 365.24
#define BODIES_COUNT 5




static body solar_Bodies[]={
    {    // Sun
        .mass=SOLAR_MASS


    },
    {    // Jupiter
        {
             4.84143144246472090e+00,
            -1.16032004402742839e+00,
            -1.03622044471123109e-01
        },
        {
             1.66007664274403694e-03 * DAYS_PER_YEAR,
             7.69901118419740425e-03 * DAYS_PER_YEAR,
            -6.90460016972063023e-05 * DAYS_PER_YEAR
        },
        9.54791938424326609e-04 * SOLAR_MASS
    },
    // ... more planets here ...
};

Rust:

// Note 1
const SOLAR_MASS: f64 = 4. * PI * PI;
const DAYS_PER_YEAR: f64 = 365.24;
const BODIES_COUNT: usize = 5;

//     ,------------------------------------- Note 2
//     |                           ,--------- Note 3
//     v                           v
static mut solar_Bodies: [body; BODIES_COUNT] = [
    body {    // Sun      <------------------ Note 4
        mass: SOLAR_MASS,
        position: [0.; 3], // <-------------- Note 5
        velocity: [0.; 3],
    },
    body {    // Jupiter
        position: [ // <--------------------- Note 6
             4.84143144246472090e+00,
            -1.16032004402742839e+00,
            -1.03622044471123109e-01
        ],
        velocity: [
             1.66007664274403694e-03 * DAYS_PER_YEAR,
             7.69901118419740425e-03 * DAYS_PER_YEAR,
            -6.90460016972063023e-05 * DAYS_PER_YEAR
        ],
        mass: 9.54791938424326609e-04 * SOLAR_MASS
    },
    // ... more planets here ...
];

Note 1: These #defines can be replaced by Rust const, which declares a constant. Constants have types, which avoids some embarrassing mistakes (don’t ask me how I know this).

Note 2: In general, data declared in C can be overwritten unless specifically marked const. Rust takes the opposite approach: data cannot be overwritten unless marked mut (for “mutable”). (This is nice, because a compiler typically can’t detect when you’re mutating something by accident in C, because you forgot to const it.)

Note 3: C uses an implied array size here. The most direct translation into Rust requires that we state the array size. Since there are a number of places where the program would become wrong if the size of this array deviated from BODIES_COUNT, I feel like this is acceptable.

Note 4: Writing a struct literal in C uses naked curly braces { ... }, and the type must be inferred from the context. In Rust, you must state the type explicitly: body { ... }.

Note 5: In C, you can write a struct literal and leave out some fields — though every C shop I’ve worked at has discouraged this. You need to provide some initializer for fields in Rust².

Note 6: In C (after C99) you can optionally give the field names in a struct literal; our program does this for the Sun, but not the planets. In Rust, you must always give the field names. (This means that struct literals don’t change meaning if you reorder the fields in a struct!)

A common reason to leave out fields is when you only care about part of a struct, and don’t want to clutter your code with a long string of zeros. Don’t worry, there is a way to do this in Rust. I’m not introducing it here because I think it would be a distraction, but if you would like a distraction, it’s called struct update syntax.

Our first function: `offset_Momentum`

From here on out we’ll be hitting the functions out-of-order, because it’s convenient for me, the author. The order of functions in a file doesn’t matter in Rust, so we can translate the program in the original order, or in the order I’m using.

Let’s start with the shortest function, offset_Momentum. For a short function, it’s going to produce a lot of commentary.


static void offset_Momentum(body bodies[]){

    for(intnative_t i=0; i<BODIES_COUNT; ++i)
        for(intnative_t m=0; m<3; ++m)
            bodies[0].velocity[m]-=
              bodies[i].velocity[m]*bodies[i].mass/SOLAR_MASS;



}

Rust:

// v------------------------------------ Note 1
unsafe fn offset_Momentum(bodies: *mut body) {
// ^                               ^---- Note 2
    for i in 0..BODIES_COUNT {  // <---- Note 3
        for m in 0..3 {
            (*bodies.add(0)).velocity[m] -=  // <- Note 4
                (*bodies.add(i)).velocity[m] // <- Note 5
                * (*bodies.add(i)).mass / SOLAR_MASS;
        }
    }
}

Note 1: the Rust version of this function is marked unsafe. While that sounds like a scary word, in our case, it simply means that we’re opting in to the ability to arbitrary (and possibly dumb) things with pointers, and that anyone calling us is risking corruption if they pass in bogus arguments. That’s the same deal the C program offered.

Note 2: the C program takes an argument body bodies[]. While this looks like an array, it’s really just a pointer — it’s equivalent to body *bodies due to array decay. As it’s non-const, the Rust equivalent is *mut body.

Note 3: a C for loop that simply walks a single index through a range of numbers — the common case — becomes a loop over a range in Rust. It generates the same code, but is easier (for me) to read, because I don’t need to hunt for “clever” index manipulation in the clauses of the loop.

Note 4: This is the elephant in the room: why did the C expression bodies[0] explode into *bodies.add(0)?

Let me begin by assuring you that this is temporary. We’ll clean it up in the next part of this series.

Now: I did this because, in the C program,

We’re using raw (C-style) pointers,
We’re doing arithmetic on them (which is what indexing is in C), and
We’re not checking array bounds or NULL, so what we’re doing might be wrong, if our caller has given us bogus arguments.

Rust won’t stop us from doing this, but it won’t go out of its way to help us either. In general, given a dangerous option and an equally good safe option, Rust will try to nudge you toward the safe option by making it easier to use. In keeping with the theme of being explicit, doing dangerous stuff requires more typing, so it’s harder to do by accident.

In C, bodies[i] is exactly the same as *(bodies + i) — it performs pointer arithmetic and a dereference and nothing else. In particular, it assumes that you have some reason to know that i is a valid index for the array. This is the common case in C and gets a shorthand using square brackets.

In Rust, the common case uses references instead of pointers³ and does bounds-checking, and so that use case gets the shorthand bodies[i]. Doing unchecked pointer arithmetic is the exception rather than the rule, and so we have to express it more deliberately.

Finally, we have to write *bodies.add(i) instead of *(bodies + i) because Rust doesn’t overload arithmetic operators for pointers. Pointers instead provide add, sub, and other operations that are easier to spot in code review; here’s the full list if you’re curious.

To the machine, references are pointers, but to the compiler and programmer they have slightly different meanings. We’ll take a closer look at this in the next part of this series.

Note 5: Given all that, you might be surprised to see velocity[m] using square brackets. I chose to simplify this because I can see that it’s equivalent:

velocity is a 3-element inline array within the body struct.
Our indices are 0..3 (meaning 0, 1, 2), so every access is in bounds.

If we tried to read off the end of the velocity array using square brackets, the program would panic at run time, which would be a behavior change. Since we don’t do this…we don’t have to worry about it.

Technically, writing velocity[m] is asking the compiler to generate a bounds check. However, rustc is quite good at deducing when the bounds checks are unnecessary, and simple loops with constant indices are the easiest case. We’ll see more of this pattern later on.

Second: `output_Energy`

For the most part, output_Energy applies the same techniques used in offset_Momentum, and I’ll present them without further commentary. The notes below call attention to things we haven’t seen before.

static void output_Energy(body bodies[]){
    double energy=0;
    for(intnative_t i=0; i<BODIES_COUNT; ++i){
        // Add the kinetic energy for each body.
        energy+=0.5*bodies[i].mass*(
          bodies[i].velocity[0]*bodies[i].velocity[0]+

          bodies[i].velocity[1]*bodies[i].velocity[1]+

          bodies[i].velocity[2]*bodies[i].velocity[2]);


        // Add the potential energy between this body and
        // every other body.
        for(intnative_t j=i+1; j<BODIES_COUNT; ++j){
            double position_Delta[3];

            for(intnative_t m=0; m<3; ++m)
                position_Delta[m] =
                   bodies[i].position[m]
                      - bodies[j].position[m];





            energy-=bodies[i].mass*bodies[j].mass/sqrt(
              position_Delta[0]*position_Delta[0]+
              position_Delta[1]*position_Delta[1]+
              position_Delta[2]*position_Delta[2]);



        }
    }

    // Output the total energy of the system.
    printf("%.9f\n", energy);
}

Rust:

unsafe fn output_Energy(bodies: *mut body){
    let mut energy = 0.;
    for i in 0..BODIES_COUNT {
        // Add the kinetic energy for each body.
        energy += 0.5 * (*bodies.add(i)).mass * (
            (*bodies.add(i)).velocity[0]
                * (*bodies.add(i)).velocity[0]
            + (*bodies.add(i)).velocity[1]
                * (*bodies.add(i)).velocity[1]
            + (*bodies.add(i)).velocity[2]
                * (*bodies.add(i)).velocity[2]);

        // Add the potential energy between this body and
        // every other body.
        for j in i+1..BODIES_COUNT {
            let mut position_Delta =   // <----------- Note 1
                [mem::MaybeUninit::<f64>::uninit(); 3];
            for m in 0..3 {
                position_Delta[m].as_mut_ptr().write(
                    (*bodies.add(i)).position[m]
                        - (*bodies.add(j)).position[m]
                );
            }
            let position_Delta: [f64; 3] = // <------- Note 2
                mem::transmute(position_Delta);

            energy -= (*bodies.add(i)).mass
                * (*bodies.add(j)).mass
                / f64::sqrt(               // <------- Note 3
                      position_Delta[0]*position_Delta[0]+
                      position_Delta[1]*position_Delta[1]+
                      position_Delta[2]*position_Delta[2]
                  );
        }
    }

    // Output the total energy of the system.
    println!("{:.9}", energy);
}

Note 1: We’ve hit some implicit complexity in the C program.

It declares a local variable like so:

double position_Delta[3];

That’s an array of three doubles, sure — but the important part is what’s missing: an initializer. C leaves local variables uninitialized unless you give them an initializer. In this case, the elements of position_Delta contain arbitrary, unpredictable values. That’s fine, because the program immediately replaces them with good values in the following loop.

Why is this important? Sometimes leaving memory uninitialized is important for performance, so I’m using this as an opportunity to demonstrate the technique you’ll use. (In this particular case, I measured it and it doesn’t seem to affect performance.)

Uninitialized variables are a source of serious bugs in C, and many other languages, you can’t even have them. You can have uninitialized variables in Rust, but you have to type more. The equivalent line is:

let mut position_Delta = [mem::MaybeUninit::<f64>::uninit(); 3];
// ...which is shorthand for...
let mut position_Delta: [mem::MaybeUninit<f64>; 3] =
    [mem::MaybeUninit::uninit(); 3];

(That ::<f64> syntax is called a “turbofish” if you haven’t seen it before. This is the only place where I’ll use it.)

The standard library provides std::mem::MaybeUninit for expressing storage locations that might be uninitialized. A MaybeUninit<T> is a T that might not be initialized. MaybeUninit::uninit() is an uninitialized value. So this line declares an array of three f64s containing arbitrary uninitialized memory, just like in C — we just have to write out the words to assure the compiler that, no, really, this is what we wanted.

Accessing these variables is unsafe, but that shouldn’t scare you — it just means you need to be real sure that the variables are initialized before you read them. The first thing the Rust code does, like C, is to write the array elements with valid data. Writing the array elements is more verbose than C:

position_Delta[m].as_mut_ptr().write(value_goes_here);

What?

MaybeUninit::as_mut_ptr() produces a mutable pointer to the possibly-uninitialized memory. write is another raw pointer operation, which writes data into the pointed-to memory without reading it first.

Why would writing to memory involve reading it first? In Rust, it’s usually because the type you’re storing has a destructor. f64 doesn’t, but I’m trying to be pedantically correct here in case you copy this code into another context.

Note 2: it’s our first transmute! Here’s the line so you don’t have to scroll up:

let position_Delta: [f64; 3] = mem::transmute(position_Delta);

std::mem::transmute is a very versatile function in Rust. It converts between any two types as long as they’re the same size, without running any code or changing any data — it simply reinterprets the bits of one type as the other. This is similar to a cast in C, but it’s more predictable and more powerful — technically it’s the equivalent of doing something like

// we're in C here
float x = something();
int y = *(int *) &x;

Because transmute will happily convert (say) a floating point number into a pointer, or any three-word struct into any other, it is firmly, gloriously unsafe. And incredibly useful.

Here, we’re transmuting from our original [MaybeUninit<f64>; 3] (see Note 1) to [f64; 3] — we’re shedding the MaybeUninit, because we, the programmer, are convinced that the array is fully initialized at this point. This means we can access the array elements freely in the rest of the function. As long as we’ve done the initialization correctly, this use of transmute is safe.

Note that we’re consuming the old position_Delta array on the right, and creating a new position_Delta variable on the left with the same name. This is a common Rust idiom for changing the type (or mutability) of a variable. Coming from a C background, this kind of “name shadowing” may seem scary to you, and that’s fine — you don’t have to use it. But you’ll definitely encounter it in Rust code.

Note 3: This one’s comparatively easy. Where C has sqrt and sqrtf, Rust has f64::sqrt and f32::sqrt.

Third: `advance`

advance is the actual simulation. It steps the solar system forward by a tiny amount. Compared to the previous two functions we’ve visited, advance is pretty long, so I’ll present it in sections.

The original advance function is heavily commented; I’ve removed those comments for space reasons.

Section one: setup

This one’s dense enough that I’ll present the two languages separately.

static void advance(body bodies[]){

    #define INTERACTIONS_COUNT \
        (BODIES_COUNT*(BODIES_COUNT-1)/2)
    #define ROUNDED_INTERACTIONS_COUNT \
        (INTERACTIONS_COUNT+INTERACTIONS_COUNT%2)

    //  ,--------------------------------- Note 1
    // v       v-------------------------- Note 2
    static alignas(__m128d) double
      position_Deltas[3][ROUNDED_INTERACTIONS_COUNT],
      magnitudes[ROUNDED_INTERACTIONS_COUNT];

Skipping right on by those #defines, which we’ve done before, we arrive at some subtle stuff! For the first time I’ve marked notes on the C side.

C Note 1: This is a variable inside a function with a static qualifier. This gives the name the same scope as a local, but rather than being stored on the stack, a single copy is placed in global memory. This has a bunch of implications.

A static variable missing an initializer is not uninitialized, the way a local is. It gets zeroed instead.
It gets zeroed once, so this function doesn’t pay to initialize it each time it gets called. This can improve performance the same way an uninitialized local can, which is almost certainly why this program is doing it this way.
Of course, because it’s zeroed once, it’s only reliably zero the first time this function is called. Next time it will contain leftover results from a previous call.
As a side effect, the function is neither reentrant nor thread-safe, because having multiple invocations of this function fighting over the static storage at the same time will produce weird results.

C Note 2: The alignas specifier, new in C11, overrides the natural alignment of something with the alignment of something else. Here, we’ve requested the alignment of __m128d, which is Intel’s name for an SSE vector type containing two doubles. This winds up aligning to a multiple of 16 bytes.

When encountering this sort of specifier, there are two questions we need to ask:

Why is the alignment being overridden? In this case, it’s because we’ll be accessing these variables as both doubles and vectors, depending on the point in the program.⁴
What is being aligned? It would be easy to read alignas(__m128d) double as a double aligned to 16 bytes, but that’s not correct. The alignas specifier applies to the array being declared. So here we have a normal array of doubles, but aligned specially.

⁴

In my personal opinion, it’s simpler, clearer, and less error-prone to do this with a union, but I didn’t write this.

Rust:

unsafe fn advance(bodies: *mut body) {

    const INTERACTIONS_COUNT: usize =
        BODIES_COUNT * (BODIES_COUNT - 1) / 2;
    const ROUNDED_INTERACTIONS_COUNT: usize =
        INTERACTIONS_COUNT + INTERACTIONS_COUNT % 2;

    // Note 1
    #[repr(align(16))]
    #[derive(Copy, Clone)]
    struct Align16([f64; ROUNDED_INTERACTIONS_COUNT]);

    // Note 2
    static mut position_Deltas: [Align16; 3] =
        [Align16([0.; ROUNDED_INTERACTIONS_COUNT]); 3];
    static mut magnitudes: Align16 =
        Align16([0.; ROUNDED_INTERACTIONS_COUNT]);

Starting at Note 1, a mysterious struct has appeared.

Rust has an alignment attribute that behaves like C’s, but it can only be applied to types, not values. Instead of overriding the alignment on some double, in Rust, we create a wrapper called something like AlignedDouble. In this case, because I’m not super imaginitive, I’ve called it Align16.

Align16 contains an array of ROUNDED_INTERACTIONS_COUNT f64s and aligns it to a 16 byte boundary. I’ve written Align16 as a tuple struct with unnamed fields, but you could write it as a normal struct if you prefer.

Align16 is marked with the align attribute, but it’s also marked as #[derive(Copy, Clone)]. This is a bit of boilerplate that says that we want the type to be freely copyable, like an f64. We need this to write the array initializers at Note 2.

Speaking of Note 2, here we initialize our statics using array literals. The statics are explicitly marked mut because we’re going to modify them.

As in C, the mut part renders advance non-reentrant and non-thread-safe. And Rust really doesn’t like that, because Rust provides guarantees about data races (namely, that they don’t exist), and having a static mut like this lets us violate that guarantee. But we’re not writing safe Rust, so it’s our responsibility to implement thread safety ourselves — and in this program, we do so by not having threads.

Moving on!

Section two: calculating distances

The C code is using a more complex for loop this time. I’ve un-indented the code, remember that we’re inside a function.

//                   v--------------------------------- Note 1
for(intnative_t i=0, k=0; i<BODIES_COUNT-1; ++i)

    for(intnative_t j=i+1; j<BODIES_COUNT; ++j, ++k) // Note 2
        for(intnative_t m=0; m<3; ++m)
            position_Deltas[m][k]=
              bodies[i].position[m]-bodies[j].position[m];

Rust:

{
    let mut k = 0;
    for i in 0..BODIES_COUNT-1 {
        for j in i+1..BODIES_COUNT {
            for m in 0..3 {
                position_Deltas[m].0[k] =
                    (*bodies.add(i)).position[m]
                      - (*bodies.add(j)).position[m];
            }
            k += 1;
        }
    }
}

Read the C closely around the two Note comments. k is declared and initialized at Note 1, so it’s in scope throughout the three nested loops, just like i. But k is not incremented there — it’s incremented at Note 2.

That is, k is counting the number of times we go through the j loop.

C’s for loop is a subtle tool, and it’s easy to express things like this in C (though we could debate how readable the result is). There’s an idiomatic way to do things like this in Rust, too, but it would require the introduction of some higher-level concepts that aren’t appropriate just yet.

Instead, I’ve taken the C loop apart and just created a k variable. The outer curly braces limit the scope of k, just as they would in C. With that simplification, the loops become simple loops over ranges, and (in my opinion) it’s clearer what, exactly, k is counting.

This is often, but not always, the right way to translate complex C for loops — at least during your first pass.

Finally, two brief notes on Rust syntax:

If this is your first time encountering syntax like position_Deltas[m].0[k], the .0 is accessing the first (only) unnamed field in the tuple struct Align16, which we defined in the last section.
You’ve probably noticed that my Rust loops have closing curly braces even when the C didn’t. While that happens to be my personal preference in C, in this case, it isn’t personal: Rust requires it.

Section three: converting to vectors

Before I present the code: if you haven’t written MMX or SSE code in C before, let me just note that __m128d is Intel’s (awkward) type name for a 128-bit vector of two doubles, and MMX/SSE operations appear as functions prefixed with _mm_.

Onward: we are entering the loop that computes the gravitational interactions of each pair of bodies in the solar system.

for(intnative_t i=0; i<ROUNDED_INTERACTIONS_COUNT/2; ++i){
    __m128d position_Delta[3];

    for(intnative_t m=0; m<3; ++m)
        position_Delta[m]=
            ((__m128d *)position_Deltas[m])[i];

Rust:

for i in 0..ROUNDED_INTERACTIONS_COUNT/2 {
    let mut position_Delta =
        [mem::MaybeUninit::<__m128d>::uninit(); 3];
    for m in 0..3 {
        position_Delta[m].as_mut_ptr().write(
            *(&position_Deltas[m].0
                as *const f64         // <----- Note 1
                as *const __m128d).add(i)
        );
    }
    let position_Delta: [__m128d; 3] =
        mem::transmute(position_Delta);

This is where we start punning the arrays of doubles as arrays of SSE vectors, so that we can read them and operate on them two-at-a-time.

The beginning and end, with position_Delta, are the same “uninitialized array” pattern we used earlier. The program creates an uninitialized array of __m128d and then fills it in by copying corresponding elements from position_Deltas, loading pairs of double/f64 as __m128d. To do this in C, we cast the array (which decays to a pointer) to an __m128d * and index into the results.

In Rust, we do the same thing. Rust casts primitive types using as. Note that there is one more cast in Rust than in C (Note 1). This is because the expression &x in Rust takes the address of x as a reference, but we want a raw pointer — so we cast first to a pointer, and then to a pointer of a different type. We’ll cover references in more detail in the next part of this series.

(I’m using *const in Rust even though C is using mutable pointers, because while C uses non-const pointers, it doesn’t actually mutate anything here. That is, the lack of const is probably a mistake, though a harmless one, in the C program.)

Section four: vector math

(I’m un-indenting the code here to fit more in this column, so just remember that we’re in the middle of a loop still.)

const __m128d distance_Squared=
  position_Delta[0]*position_Delta[0]+
  position_Delta[1]*position_Delta[1]+
  position_Delta[2]*position_Delta[2];




__m128d distance_Reciprocal=
   _mm_cvtps_pd(_mm_rsqrt_ps(_mm_cvtpd_ps(distance_Squared)));
for(intnative_t j=0; j<2; ++j)
    distance_Reciprocal=distance_Reciprocal*1.5-
      0.5*distance_Squared*distance_Reciprocal*
      (distance_Reciprocal*distance_Reciprocal);














((__m128d *)magnitudes)[i] =
    0.01/distance_Squared*distance_Reciprocal;

Rust:

let distance_Squared: __m128d = _mm_add_pd(
    _mm_add_pd(
        _mm_mul_pd(position_Delta[0], position_Delta[0]),
        _mm_mul_pd(position_Delta[1], position_Delta[1]),
    ),
    _mm_mul_pd(position_Delta[2], position_Delta[2]),
);

let mut distance_Reciprocal: __m128d =
  _mm_cvtps_pd(_mm_rsqrt_ps(_mm_cvtpd_ps(distance_Squared)));
for _ in 0..2 {
    distance_Reciprocal = _mm_sub_pd(
        _mm_mul_pd(distance_Reciprocal, _mm_set1_pd(1.5)),
        _mm_mul_pd(
            _mm_mul_pd(
                _mm_mul_pd(
                    _mm_set1_pd(0.5),
                    distance_Squared,
                ),
                distance_Reciprocal,
            ),
            _mm_mul_pd(
                distance_Reciprocal,
                distance_Reciprocal,
            ),
        ));
}

(magnitudes.0.as_mut_ptr() as *mut __m128d)
    .add(i)
    .write(_mm_mul_pd(
        _mm_div_pd(
            _mm_set1_pd(0.01),
            distance_Squared,
        ),
        distance_Reciprocal,
    ));

Well that certainly got longer! The Rust program contains a lot more calls to Intel MMX/SSE intrinsics (starting with _mm_). In fact, pretty much the only line that directly corresponds between the two programs is the one that computes the initial value of distance_Reciprocal.

There’s a reason for that: the C program is cheating.

Okay, that’s too harsh — what the C program is doing is relying on a non-standard language extension that provides operator overloading for certain types, including __m128d.

At the time I’m writing this, Rust doesn’t provide operator overloading for __m128d in the standard library, so we use the vector intrinsics explicitly. (Unlike with C, I could actually add operator overloading myself with a bit more code, but I think that’d be distracting.)

Section five: finishing up

That finishes the body of the biggest loop; let’s close it out and finish the function. There are no new techniques required here, so I’ll present this without commentary.

} // end of loop


for(intnative_t i=0, k=0; i<BODIES_COUNT-1; ++i)

    for(intnative_t j=i+1; j<BODIES_COUNT; ++j, ++k){
        const double
          i_mass_magnitude=bodies[i].mass*magnitudes[k],
          j_mass_magnitude=bodies[j].mass*magnitudes[k];

        for(intnative_t m=0; m<3; ++m){
            bodies[i].velocity[m] -=
                position_Deltas[m][k]*j_mass_magnitude;
            bodies[j].velocity[m] +=
                position_Deltas[m][k]*i_mass_magnitude;
        }

    }



for(intnative_t i=0; i<BODIES_COUNT; ++i)
    for(intnative_t m=0; m<3; ++m)
        bodies[i].position[m]+=0.01*bodies[i].velocity[m];

Rust:

} // end of loop

{
    let mut k = 0;
    for i in 0..BODIES_COUNT-1 {
        for j in i+1..BODIES_COUNT {
            let i_mass_magnitude =
                (*bodies.add(i)).mass * magnitudes.0[k];
            let j_mass_magnitude =
                (*bodies.add(j)).mass * magnitudes.0[k];
            for m in 0..3 {
                (*bodies.add(i)).velocity[m] -=
                   position_Deltas[m].0[k] * j_mass_magnitude;
                (*bodies.add(j)).velocity[m] +=
                   position_Deltas[m].0[k] * i_mass_magnitude;
            }
            k += 1;
        }
    }
}

for i in 0..BODIES_COUNT {
    for m in 0..3 {
        (*bodies.add(i)).position[m] +=
            0.01 * (*bodies.add(i)).velocity[m];
    }
}

And that’s the end of advance. Whew! It’s all downhill from here!

Finally: `main`

main simply calls the functions we’ve already seen.

int main(int argc, char *argv[]){

    offset_Momentum(solar_Bodies);
    output_Energy(solar_Bodies);
    for(intnative_t n=atoi(argv[1]); n--;
        advance(solar_Bodies));



    output_Energy(solar_Bodies);

}

Rust:

fn main() {
    unsafe {
        offset_Momentum(solar_Bodies.as_mut_ptr());  // Note 1
        output_Energy(solar_Bodies.as_mut_ptr());
        let c = std::env::args().nth(1).unwrap()  // Note 2
            .parse().unwrap();
        for _ in 0..c {
            advance(solar_Bodies.as_mut_ptr())
        }
        output_Energy(solar_Bodies.as_mut_ptr());
    }
}

These two implementations are pretty similar, but there are differences.

Note 1: Remember that solar_Bodies is our static mut array of body structs, defined above. In C, we pass it to functions using just its name, solar_Bodies. This works because, in C, we can’t pass arrays by value, so using the name of an array as a function argument causes decay to a pointer. In C,

int data[5];

foo(data);

is equivalent to

int data[5];

foo(&data[0]);

This is not true in Rust. If we pass solar_Bodies to a function, we’re passing, by value, an array of BODIES_COUNT body structs. That’s not what we want.

Like in C, the ampersand in Rust can be used to take the address of a variable, but &solar_Bodies produces a reference instead of a pointer. We’d need to insert a cast to a pointer.

However, this situation is pretty common, and there’s a shorthand: array.as_mut_ptr() returns a mut pointer to the first element of array. We use this at each call to pass solar_Bodies to the functions that operate on it.

Note 2: Both programs parse the first command line argument as an integer and use that to determine the number of times to advance. The way the two programs do it is different.

In C, we use:

atoi(argv[1])

In Rust, we use the much longer expression

std::env::args().nth(1).unwrap()
    .parse().unwrap()

This is another case of implicit complexity in C becoming explicit in Rust, and it all comes down to failure behavior.

What does the C program do when no command line arguments are provided? argv[1] is then an access past the end of the argv array, so it reads arbitrary memory as a char *. It then passes the arbitrary pointer to atoi, which reads the memory it points to and tries to parse it as an integer. In practice, on Linux at least, this segfaults.

What does the C program do when an argument is provided, but it isn’t an integer? atoi doesn’t detect errors, so non-integer arguments are treated as 0.

The Rust program contains unwrap() calls at each of these points, which is a Rust idiom that converts an error condition into an immediate panic. We could choose a different response, like substituting invalid arguments with a default value, but for this program, I haven’t bothered.

This is a case where I’ve introduced memory safety where the original program had none — why now? Well, for cross-platform compatibility reasons⁵, Rust wraps command line arguments behind an abstraction that makes it pretty hard to read off the end of argv. Rather than beating my head against it, I just did the Rusty thing.

⁵

I’m guessing this has something to do with Windows?

Evaluating the translated program

Completed source code for phase one.

To build the C version:

$ time gcc -O3 -fomit-frame-pointer -march=native -funroll-loops -static \
    nbody.gcc-8.c -o nbody.gcc-8 -lm

real	0m0.294s
user	0m0.264s
sys	0m0.030s

Building the Rust version:

$ time rustc -C opt-level=3 -C target-cpu=native -C codegen-units=1 \
    nbody-1.rs -o nbody-1

real	0m0.369s
user	0m0.310s
sys	0m0.048s

rustc is compiling a bit slower than gcc here, but they’re both fast enough that we’re probably mostly seeing startup time. For a more even comparison, I’ll include the results building with clang, which shares a lot of code with rustc.

$ time clang -O3 -fomit-frame-pointer -march=native -funroll-loops -static \
    1/nbody.gcc-8.c -o nbody.clang-8.bench -lm

real	0m0.418s
user	0m0.343s
sys	0m0.058s

Let’s get the golden output from the C version to make sure our translation was correct:

$ ./nbody.gcc-8 50000000
-0.169075164
-0.169059907

And our translation:

$ ./nbody-1 50000000
-0.169075164
-0.169059907

The output matches (which mostly means we didn’t mess up the order of the floating point operations). We can compare the performance using hyperfine. In this case, as in all performance measurements presented in this series, I’m on an Intel i7-8550U with the performance governor and Turbo Boost off.

Command	Mean [s]	Min [s]	Max [s]	Ratio
`./nbody.gcc-8.bench 50000000`	6.199 ± 0.006	6.185	6.206	1.17x
`./nbody.clang-8.bench 50000000`	5.277 ± 0.007	5.258	5.282	1.00x
`./nbody-1.bench 50000000`	5.123 ± 0.024	5.095	5.161	0.97x

Our initial grungy Rust program is roughly the same speed as the C program when compiled by clang. Because gcc is so much slower, I’ll use clang as my point of comparison in the rest of this series.

Next Steps

We’ve taken a C program and produced an essentially equivalent Rust program, right down to the pointer casting and uninitialized variables. The result is a program that neither a Rust nor a C programmer would love, but it makes an important point: that we can express the same tricky stuff in Rust that we can in C, just with different syntax. Rust wants to help you stay safe, but unlike a lot of “safe” languages, it will get out of your way when required.

The two programs get the same results in roughly the same amount of time. In Part 2, I’ll begin cleaning up the code, and discuss the advantages of using safe Rust’s features to produce more robust code, and the performance cost of safety. (Spoiler: we’re not going to get any slower.)

#c #rust #tutorial

Cliffle

Our first program

Initial translation to ugly Rust

Prelude

Types

Data

Our first function: offset_Momentum

Second: output_Energy

Third: advance

Section one: setup

Section two: calculating distances

Section three: converting to vectors

Section four: vector math

Section five: finishing up

Finally: main

Evaluating the translated program

Next Steps

Our first function: `offset_Momentum`

Second: `output_Energy`

Third: `advance`

Finally: `main`