Measure What You Optimize
Learn Rust the Dangerous Way, Part 3
- Refreshing your (uninitialized) memory
- Measuring
- A safe version of output_Energy
- What have we learned?
- Where next?
In part 2 we introduced Rust references, and this was enough to convert one of our inner functions into safe Rust.
The others are still unsafe
. There are several reasons for this. In this, the
briefest of sections, we’ll tackle the easiest one: deliberate use of
uninitialized memory.
Refreshing your (uninitialized) memory
At the end of part 2, our program looked like this:
There are two places in the original C program — and, thus, in our program — where the author deliberately left a local variable uninitialized, to be filled in later. While the choice isn’t explained within the program, I can make an educated guess as to why it was done this way: performance.
Often, the easiest way to improve the performance of your software is to make it do less. For example, imagine a program that is decompressing a gigabyte of data into RAM, using a single 1GiB allocation. The program could go through and zero the gigabyte before decompressing, but this is technically wasted effort: it’s going to overwrite every last one of those zeros with valid data.
Many safe or higher-level languages don’t give the programmer a choice:
uninitialized memory is dangerous, so your memory is going to be initialized.
Rust and C let the programmer decide, and this can be a useful tool for
optimizing memory accesses. For instance, Vec
from Rust’s standard
library uses uninitialized memory under the hood when you ask it to pre-allocate
space for elements without filling them in, among other times.
In C, uninitialized variables are easy to overlook, because you’re looking for something that isn’t there (the initializer). One simplified example from nbody:
double position_Delta;
for
position_Delta= initial_value_here;
When declaring a local, double position_delta[3];
declares an uninitialized
array. The most convenient thing to type is also, usually, the wrong and
dangerous one.
As we saw in part 1, Rust requires every variable to be initialized, and if you’d like it to not do that, you have to be very explicit:
let mut position_Delta = ;
for m in 0..3
let position_Delta: = transmute;
That’s a lot of code, though, and it distracts from what we’re actually trying to do (initialize the array) with an irrelevant task (managing uninitialized memory). Compare that to initializing the array with zeros and immediately overwriting it:
let mut position_Delta = ;
for m in 0..3
That’s much shorter and clearer! But taken at face value, it’s doing around 2x the work.
Is it, though? There’s one way to find out, and that’s to try it.
Measuring
If we replace all uses of uninitialized memory with the simpler equivalent, how do the results compare to nbody-2? I usually start by checking the sizes:
$ size nbody-2 nbody-3
text data bss dec hex filename
226776 10744 888 238408 3a348 nbody-2
226776 10744 888 238408 3a348 nbody-3
Um. So they’re exactly the same size. That’s often a hint that the compiler output didn’t change. Here’s an easy way to test:
$ diff <(objdump -d --demangle nbody-2) <(objdump -d --demangle nbody-3)
2c2
< nbody-2: file format elf64-x86-64
---
> nbody-3: file format elf64-x86-64
Only the name of the program has changed. (Since the filename doesn’t affect the performance of the program, the performance has also not changed.)
The compiler has noticed that we initialized the array and then overwrote it
completely. The zeros stored to the array were dead stores, and the compiler
applied dead store elimination to get rid of them. This is not unique to
rustc
: back-porting this change to the C program shows that gcc
does the
same thing.
For small and predictably-sized structures — the sort of thing you’d allocate on the stack — it’s almost never a performance win to use uninitialized memory. Dead store elimination is the sort of thing modern compilers can do while wearing a blindfold and riding a unicycle.
A safe version of output_Energy
Not only is the new version of the code simpler, it’s also safe. The code no longer relies on features that, if misused, can break memory safety — like uninitialized memory.
In fact, the use of uninitialized memory was the only unsafe
bit remaining in
output_Energy
, which we can check by deleting the unsafe
keyword from the
top of the function and recompiling. If it compiles, we’re good.
Here’s the new safe version of output_Energy
:
You might be wondering why we care. The program worked fine with the unsafe
version of this function, and every C function is the equivalent of unsafe
.
The answer comes back to local reasoning. Since the function is missing the
unsafe
modifier, and contains no unsafe
blocks in its body, I don’t need to
read the rest of the program to tell that this function can’t…
- Dereference a null pointer,
- Read off the end of an array,
- Use uninitialized memory inappropriately,
This means less debugging for us, and fewer possible security exploits for our users.
Here’s the entire program with the safe output_Energy
in context:
Its performance matches the program from part 2, as you’d expect.
Command | Mean [s] | Min [s] | Max [s] | Ratio |
---|---|---|---|---|
./nbody.clang-8.bench 50000000 | 5.277 ± 0.007 | 5.258 | 5.282 | 1.00x |
./nbody-1.bench 50000000 | 5.123 ± 0.024 | 5.095 | 5.161 | 0.97x |
./nbody-2.bench 50000000 | 5.101 ± 0.005 | 5.093 | 5.107 | 0.97x |
./nbody-3.bench 50000000 | 5.103 ± 0.002 | 5.100 | 5.105 | 0.97x |
What have we learned?
Optimizations usually add complexity. The complexity is often worthwhile. But if the “optimization” doesn’t improve performance, then we’ve just added complexity for no good reason. In the case of optimizations that do subtle things with memory, there’s also a real risk that the added complexity introduces crashing bugs and security flaws.
Unless you’re writing assembly language, your program is not a list of literal instructions that a computer will obey. It will be analyzed, massaged, optimized, scheduled, etc. by an intermediary first — in the case of Rust and C, that’s the compiler.
This is why, if you have demanding performance requirements, it’s critical to check that your optimizations have done what you expect, by looking — directly or indirectly — at the compiler’s output.
In this case, a reasonable-looking optimization applied in the C program turns
out to have been unnecessary (in both C and Rust). Because the optimization
required unsafe
features, and safe code makes us think and type less, Rust
gave us an incentive to double- and triple-check this optimization. Without this
incentive, we might not have tried.
Several times, I’ve applied “optimizations” in Rust programs that used unsafe
operations in clever ways, only to find out that the equivalent, “naïve”
safe code is just as fast or even faster. While I may be clever, the compiler is
also clever, and the performance of my program is the result of a dialog
between the two of us.
Where next?
I’m going to keep sanding down the unsafe
parts of this program. We’ve covered
the simple cases; the uninitialized variables in this section were the last bits
that can be trivially made safe by massaging the code. Now we have to turn our
focus to the advance
function.
The optimizations used in advance
require unsafe
code. Instead of
eliminating it, so that the compiler can prove our code safe, we must switch to
a more subtle technique: corralling it into small sections, so that we,
the programmer, can verify it as safe with local reasoning.
This is the topic of part 4.