- Scope and goals
- Creating the simplest possible WebAssembly module
- Creating the smallest useful WebAssembly module
- Making some pixels
- Adding animation
- Help! My binary just got much bigger! (Diagnosing and fixing sudden bloat.)
I’ve been studying WebAssembly recently, which has included porting some of my
m4vga graphics demos. I started with the Rust and WebAssembly
Tutorial, which has you use fancy tools like
npm to produce a Rust-powered webpage.
And that’s great! But I want to know how things actually work, and those tools put a lot of code between me and the machine.
The resulting WebAssembly module will be less than 300 bytes. That’s about the same size as the previous paragraph.
Concretely, you will need these tools from the tutorial:
$ rustup target add wasm32-unknown-unknown
Because we’re going to be manipulating the compiler output, you’ll also need two sets of tools that weren’t mentioned in the tutorial:
Finally, all the examples below will assume you’re on some flavor of Unix.
This post will cover the process of creating tiny graphics demos without doing any binary hacking, hex editing, or writing WASM by hand. Those are all fun techniques, and I’ll probably write a separate post on them later. But for this article, you don’t need to understand any of that stuff.
I’ve posted the complete code in a GitHub repository, with one commit per tutorial step. I’ll link to the commit from each step below if you want to follow along.
Scope and goals
Our goal is to write a tiny program in Rust, and display the output in a browser. While we could use text for the output, my preferred way of showing something works is by drawing pictures. So let’s do that.
To get the program loaded into the browser, we’ll compile it as a WebAssembly
module. A WebAssembly module is essentially a dynamic (shared) library.
That is, it is not a traditional executable with a
main function that gets run
— it is a collection of exported things, which can be functions or
whatever it exports. You can have a
main function, if you want, but it isn’t
it to execute.
Out of the box, the WebAssembly environment doesn’t have a concept of “graphics.” The only things a module can do to interact with the outside world (including the browser) are
canvas or something,
and provide them to the WebAssembly module as imports. But since we aren’t using
wasm-bindgen, that would be complicated and tedious.
Instead, we are going to produce an image in the simplest way possible: we’re
going to deposit pixels into a region of memory from the WebAssembly program,
for pasting those pixels into a
Let’s get started!
Creating the simplest possible WebAssembly module
As a first step, let’s make a tiny WebAssembly module that does nothing useful. This will serve as a template for our real code.
Create a new project using Cargo:
$ cargo new --lib bare-metal-wasm
(We requested a
lib style project because, as I noted above, a WebAssembly
module is basically a shared library.)
Alter the crate type of the new project to be
cdylib by adding these lines to
 = ["cdylib"]
And now, build it:
$ cargo build --target wasm32-unknown-unknown --release
(We’re building with
--release because we want small binaries.)
Great! We now have a WebAssembly module that contains no functions or variables of any kind. It should be tiny, right? Let’s look!
$ ls -lh target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm -rwxr-xr-x 2 cbiffle cbiffle 812K Jun 7 19:36 target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm
…yes, that says 812 kiB, which is, um, bigger than we were expecting. If you’ve tried to create small executables before, you can probably guess why: the binary still contains debug symbols. Let’s strip it.
$ wasm-strip target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm $ ls -lh target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm -rwxr-xr-x 2 cbiffle cbiffle 102 Jun 7 20:32 target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm
Now it’s 102 bytes. It would fit in a tweet! That’s a better place to start.
We can do better, though, by applying
$ wasm-opt -o opt.wasm -Oz target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm $ ls -lh target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm -rw-r--r-- 1 cbiffle cbiffle 71 Jun 7 20:35 opt.wasm
Now we’re down to 71 bytes. We could make that smaller1, but we’ll stop there.
Dumping the code of our 71-byte binary shows that it does absolutely nothing.
$ wasm-objdump -d opt.wasm opt.wasm: file format wasm 0x1 Code Disassembly:
Just like we wanted!
There are some default exports related to memory management that we
could remove, and we could shorten some internal names if we really wanted.
This could save a couple dozen bytes, but it would take more work than just
wasm-opt. I may cover this in a future post.
Creating the smallest useful WebAssembly module
First, we need some prerequisites. We aren’t going to use Rust’s
— it’s awesome, but we can do without. Instead, we’ll rely on
std. We will alter
src/lib.rs to opt-out of
std2, so that
we’ll get an error if we try to use it. We then we need to provide a function to
handle panics — something
std normally does for us.
Opting out of
std doesn’t change the size of your program. Rust
programs only include the code from
std that they use. So if you don’t use
std, like the programs on this page, you don’t strictly speaking need to
opt out of
std to make small programs. I choose to opt out because it means
fewer surprises of the form “hey, my program suddenly grew by 12kiB, what
Remove the generated code from
src/lib.rs and start with:
// src/lib.rs !
Any panic will hang the page, so try not to panic.
#[no_mangle] ensures that the function is exported as, verbatim,
export from Rust.)
And we’re done! We need to build and minimize the binary again. That’s going to
get repetitive, so let’s put it in a shell script called
#!/bin/bash set -euo pipefail TARGET=wasm32-unknown-unknown BINARY=target/$TARGET/release/bare_metal_wasm.wasm cargo build --target $TARGET --release wasm-strip $BINARY mkdir -p www wasm-opt -o www/bare_metal_wasm.wasm -Oz $BINARY ls -lh www/bare_metal_wasm.wasm
$ chmod +x build.sh $ ./build.sh Finished release [optimized] target(s) in 0.01s -rw-r--r-- 1 cbiffle cbiffle 103 Jun 7 21:29 www/bare_metal_wasm.wasm
We’re up to 103 bytes, but we now have code to actually do something, which we
can see if we
objdump the binary:
$ wasm-objdump -d www/bare_metal_wasm.wasm www/bare_metal_wasm.wasm: file format wasm 0x1 Code Disassembly: 000063 <the_answer>: 000064: 41 2a | i32.const 42 000066: 0b | end
Let’s embed it in a webpage and see if it works. Here’s
Now, if we load that page in a browser, we would hope to see
42 printed in the
developer console. But there’s a catch: browsers won’t load WebAssembly modules
from local files. You need to serve the page and module through a web server. If
you already have a web server somewhere, toss the page and module on it and test
them out. Otherwise, here’s a simple webserver:
A simple webserver for serving this app
Python includes a webserver that will do the job, but it doesn’t serve the correct MIME type for WebAssembly modules. So we have to configure it.
Paste this into a file called
#!/usr/bin/env python3 = 8080 = =
Now, if you
cd www and run
serve.py, the contents of the
will be served up at
Browse to our
index.html file on your webserver, and open your browser’s
developer tools console (typically F12). You should see
We now have a trivial — but working! — web application using Rust. For those keeping score, our file sizes are currently:
$ ls -l www -rw-r--r-- 1 cbiffle cbiffle 103 Jun 7 21:29 bare_metal_wasm.wasm -rw-r--r-- 1 cbiffle cbiffle 276 Jun 7 21:32 index.html
Now let’s do some graphics.
Making some pixels
As I mentioned above, we’re going to generate an image in the memory of the
WebAssembly module, and then transfer it onto a
takes a surprisingly small amount of code, but there is a subtle part: how we
lay out the image in memory.
ImageData class to hold the image. That
class has opinions about how image data should be formatted: each pixel
consists of exactly four bytes, in the order R, G, B, A. (“A” is alpha, or
opacity. We’ll always set A to 0xFF, or “fully opaque.”) These four-byte pixels
are organized in the common raster order: left to right, top to bottom, like
We’ll represent the four-byte pixels using
u32 in Rust. Because
thinks of pixels as an array of four bytes, and we’re treating it as a
have to consider endianness. WebAssembly is little-endian, so the
the pixel components in the reversed order
Let’s declare a decent-sized image buffer in Rust. For simplicity, we’ll put it
at a fixed location in memory using
static. This isn’t idiomatic Rust, but by
opting out of
std we have given up our ability to allocate memory — so
we do the smaller, simpler, and slightly less safe thing instead.
// in src/lib.rs const WIDTH: usize = 600; const HEIGHT: usize = 600; static mut BUFFER: = ;
#[no_mangle] again, because we’re going to be reaching into the
#[no_mangle] on a
static also has the side effect of exporting it from the
module. I still find this counter-intuitive, but that’s how it works.)
We declare the
BUFFER to be initially filled with zeros, because doing so is
cheap. But we don’t want it to be zeros forever. In particular, if we were to
draw the buffer full of zeros, nothing would happen — because all the
alpha bytes are zero, the image is entirely transparent.
So let’s write a routine to fill it with something. Go ahead and delete
the_answer and replace it with:
// back in src/lib.rs pub unsafe extern // We split this out so that we can escape 'unsafe' as quickly // as possible.
The split between these two functions may be surprising. Remember that Rust
doesn’t allow accesses to mutable
static variables in safe code3. Since
go, we know that no other Rust code has a reference to
BUFFER. So we can
safely run the
You might be wondering why accesses to
static mut variables are unsafe.
&mut references are by definition unique, but since any code
in a module can see a
static, any code in the module can just say
&mut BUFFER at any time — making an unlimited number of supposedly unique
references! Use of
static mut is incredibly rare in Rust code; I do it in my
embedded work to avoid needing a memory allocator, which is the same reason we
do it here.
But at that point, we want to get out of
unsafe and back to the guarantees we
love so much. So we pass the
render_frame_safe. The reference we
pass is a unique reference to
BUFFER, so manipulating the buffer through it is
Here’s the updated
build.sh and load the files through your webserver. If everything is
working, your page should now contain a bright magenta square!
Let’s replace it with a simple procedural texture.
// Replace the old render_frame_safe with this:
Build and reload, and the magenta square should be replaced by a red, tartan-like pattern. It should look like this:
That’s not a screenshot, incidentally, that’s the actual program running. A screenshot PNG would have taken 14kiB; this took 189 bytes.
(If you can’t see the texture above, you won’t be able to see the ones you write, either. It’s time for a browser upgrade.)
You now have a procedural texture, written in WebAssembly, and displaying in a browser! Try messing around with the generation routine to produce other patterns. The framework we’ve built here is enough to have quite a bit of geeky art fun, but let’s keep going.
The Rust program will keep track of its state frame-to-frame. Initially, this will mean keeping track of a frame number, but it might also derive the next frame from the previous contents of
BUFFER— up to you.
gofunction once per frame to update the contents of
BUFFER, and then display those contents.
We’ll use the
To introduce simple animation to our existing texture, we’ll incorporate the
frame number into our pixel formula in addition to
First: let’s add some global state to the WebAssembly module to keep track of
the frame number. The last global state we added was
BUFFER, which required
unsafe code to access (because it’s a
static mut and Rust is suspicious of
our ability to write thread-safe code). If we just want to store a single
number, we can use a much easier tool: atomics.
// in src/lib.rs use ; static FRAME: AtomicU32 = new;
go, we want to update the
BUFFER and then
AtomicU32 provides a handy
fetch_add operation that can
retrieve the current frame number, and advance the global counter, in one easy
render_frame_safe is now:
If you build and reload, you should see the same static tartan. We haven’t
We only need to change about four lines at the end of
<!-- the only changes to this script are at the end... --> <!-- rest of page omitted in example -->
We’ve created a closure called
render that will…
- Call the Rust
- Splat its buffer into the canvas.
- Schedule itself to be called at the next frame.
We then have to call it once to prime the pump, as it were, and the process will run forever.
You should see this:
It’s a self-mutating tartan! (Mutartan?)
And how much have we paid to introduce animation to our program? Let’s check sizes:
$ ls -l www/*.wasm -rw-r--r-- 1 cbiffle cbiffle 213 Jun 8 09:29 bare_metal_wasm.wasm
We’re up to 213 bytes.
This is cool, because displaying even a few seconds of the animation as a GIF would have taken tens or hundreds of kilobytes. Every frame of our 213-byte animation is unique (though very subtly so) and there are 232 of them. Over the course of two years, it will gradually become green, and eventually, blue. That’d be a big GIF, but it’s a tiny program.
At this point, I encourage you to play around with the pixel generation function and create some of your own patterns! The rest of this article is devoted to troubleshooting issues you may encounter.
Help! My binary just got much bigger! (Diagnosing and fixing sudden bloat.)
The tl;dr here is: you probably introduced panicking code, probably a bounds check. This adds anywhere from 300 bytes to 2kiB in my experience, depending on how hard the code works to produce a nice error message.
You can find out what code you just introduced by inspecting your binary. You
need to look at the binary before we strip it, so you can’t use
I strongly suggest you install
rustfilt if you haven’t already:
$ cargo install rustfilt
Now, generate an unstripped binary and dump it:
$ cargo clean $ cargo build --target wasm32-unknown-unknown --release $ wasm-objdump -d \ target/wasm32-unknown-unknown/release/bare_metal_wasm.wasm \ | rustfilt | less
You don’t need to read actual WebAssembly instructions for this to be useful — just look at the function names. If you see a line like the following, you have introduced a runtime bounds check that may panic:
More generally, you have introduced a potential panic if you see a line reading (the number at the beginning of the line may be different):
Avoiding the problem
Try to avoid introducing panicking code. Here are some tips:
Prefer references to explicitly sized arrays, rather than slices.
&mut // can avoid bounds checks &mut // often can't
Use iterators rather than indexing.
Handle corner cases yourself instead of relying on panic. For every
operation that might panic in
core, there’s an alternative that can’t panic.
For example, if you really need to use slices, you can access then with
Don’t try to replace that
unwrap() — that just replaces one
panic with another.
Hacking around it with
In the last function above, we “handled” a panic case ourselves by entering an infinite loop. Wouldn’t it be great if we could change some setting and do this to every panic — current and future?
Rust kind of provides this with the abort on panic setting, but it still brings in a bunch of panic-related code (about 2kiB of it).
There’s another solution:
wasm-snip. It is an imperfect solution, but it can
$ cargo install wasm-snip
Here is a binary that I produced with panicking code included:
$ ls -lh panics.wasm -rw-r--r-- 1 cbiffle cbiffle 2.0K Jun 8 11:22 panics.wasm
And here it is after snipping:
$ ls -lh snipped.wasm -rw-r--r-- 1 cbiffle cbiffle 663 Jun 8 11:28 snipped.wasm
wasm-snip analyzes a WebAssembly binary and replaces calls to certain
functions (which you choose) with an
unreachable instruction. This means the
program will halt if you try to use a snipped function, reporting an exception
You need to
snip a binary before you
strip it, because
use of the debug symbols to decide what to snip. To incorporate snipping into
your build process, alter
build.sh to read as follows:
#!/bin/bash set -euo pipefail TARGET=wasm32-unknown-unknown BINARY=target/$TARGET/release/bare_metal_wasm.wasm cargo build --target $TARGET --release # NEW PART: wasm-snip --snip-rust-fmt-code \ --snip-rust-panicking-code \ -o $BINARY \ $BINARY wasm-strip $BINARY mkdir -p www/ wasm-opt -o www/bare_metal_wasm.wasm -Oz $BINARY ls -lh www/bare_metal_wasm.wasm
This is an imperfect solution, because (as I showed above) the binary has increased from 213 to 663 bytes. If we’ve removed the panic code, what’s responsible for the additional 450 bytes?
The answer is frustrating: string literals.
Panics take a message, and the built-in panics for things like “array index out
of bounds” include messages explaining the condition.
wasm-snip is currently
not able to detect that those messages are unused once the panic code is
removed, and so they get included in our binary. You can see them by using
$ wasm-objdump -x www/bare_metal_wasm.wasm bare_metal_wasm.wasm: file format wasm 0x1 Section Details: # Bunch of stuff omitted for the purposes of this post... Data: - segment size=312 - init i32=2488580 - 025f904: 7372 632f 6c69 622e 7273 0000 04f9 2500 src/lib.rs....%. - 025f914: 0a00 0000 2a00 0000 0d00 0000 0200 0000 ....*........... - 025f924: 0000 0000 0100 0000 0300 0000 696e 6465 ............inde - 025f934: 7820 6f75 7420 6f66 2062 6f75 6e64 733a x out of bounds: - 025f944: 2074 6865 206c 656e 2069 7320 2062 7574 the len is but - 025f954: 2074 6865 2069 6e64 6578 2069 7320 0000 the index is .. - 025f964: 30f9 2500 2000 0000 50f9 2500 1200 0000 0.%. ...P.%..... - 025f974: 3030 3031 3032 3033 3034 3035 3036 3037 0001020304050607 - 025f984: 3038 3039 3130 3131 3132 3133 3134 3135 0809101112131415 - 025f994: 3136 3137 3138 3139 3230 3231 3232 3233 1617181920212223 - 025f9a4: 3234 3235 3236 3237 3238 3239 3330 3331 2425262728293031 - 025f9b4: 3332 3333 3334 3335 3336 3337 3338 3339 3233343536373839 - 025f9c4: 3430 3431 3432 3433 3434 3435 3436 3437 4041424344454647 - 025f9d4: 3438 3439 3530 3531 3532 3533 3534 3535 4849505152535455 - 025f9e4: 3536 3537 3538 3539 3630 3631 3632 3633 5657585960616263 - 025f9f4: 3634 3635 3636 3637 3638 3639 3730 3731 6465666768697071 - 025fa04: 3732 3733 3734 3735 3736 3737 3738 3739 7273747576777879 - 025fa14: 3830 3831 3832 3833 3834 3835 3836 3837 8081828384858687 - 025fa24: 3838 3839 3930 3931 3932 3933 3934 3935 8889909192939495 - 025fa34: 3936 3937 3938 3939 96979899
We’ve got the source file name, a message explaining a bounds check failure, and a curious table of numbers that probably has something to do with formatting the index as decimal. We don’t need any of these, but our tools don’t know that.
You can fix this by hacking the binary, but that’s out of scope for this post.
A lot of graphics demos wind up needing trigonometry —
mostly. If you’re playing with procedural image generation, you’re probably
going to hit this when you try to implement Perlin noise or plasma.
Here’s a replacement
render_frame_safe that generates a tiny moiré
Unfortunately, this code fails to compile:
error[E0599]: no function or associated item named `sin` found for type `f32` in the current scope
What? Of course
f32 has a
sin function. It’s right there in the
Unfortunately, trig routines are part of
core. I personally
think this decision is silly4, but in our case, it turns out to be
I think it’s silly because, on the embedded processors where I do
most of my hacking, the trig routines from
std often reduce to single
instructions, because they use compiler features that aren’t exposed to mere
users like me on
stable Rust. Anything I write will be less efficient, and I
The easiest fix is to just use
- Remove the
- Remove our custom
However, if you try this, you’ll notice something alarming: calling
adds 5kiB to your binary!
Why? Well, WebAssembly currently doesn’t have a
sin operation (though one has
been proposed, and may be added in the future). This means the Rust
library has to include its own implementation of
sin, and a high-quality
sin takes a fair amount of code.
So, by excluding
core has actually just saved us from a surprise
binary inflation. But now what do we do?
Sure, you could write your own low-quality version of
sin, which is a
time-honored tradition among demo programmers. But there’s a much easier option.
I said that WebAssembly doesn’t provide a
sin operation, but you know who
Add this to your Rust code to request an import called
js_sin. (The name is
Rust considers all imported functions to be
unsafe by default. This is
sin doesn’t do anything like that. So, we provide a safe wrapper.
Now replace all your calls to
sin(x), and you can use
trigonometry, for a cost of about a dozen bytes!
strings, objects, etc. are better left to
A word of caution: calling
sin for every pixel like I showed above takes
a lot of processing power. This is why I haven’t embedded the WebAssembly
program here as an example: it would drain your battery while you’re reading.
Just like on computers of yore, call
sin ahead of time and generate a lookup