In case you don't want to wire together your own computer based on the STM32F407, you can also view several of my m4vga demos right here in your browser! Scroll down for more info.
This is not a video — the actual m4vga code is running on your device, pushing pixels to your browser. (How? The demos are written in Rust, and are written in such a way that they can compile for either Cortex-M4 or WebAssembly.)
Here's information about what you're seeing, plus optional spoilers if you don't want to figure out the hacks yourself.
The old-school "tunnel-zoomer" effect, showing what appears to be a texture-mapped cylinder, drawn on a computer that doesn't have enough CPU power for anything of the sort. (It's a trick.)
In addition to the traditional tunnel-zoomer effect (in short: do all the trig up front and use a lookup table), this implementation uses some tricks.
tunnel halves the horizontal scan rate and enables line-doubling, so
it draws "fat pixels" for an effective resolution of 400x300. This reduces the
number of pixels we need to compute. But there are still enough pixels that
rendering takes most of a frame — so we need to double-buffer.
But! 400x300 at 8-bit color would take almost all of our RAM for a single frame, much less two!
Second, because our video is produced by software, we have a trick that was unavailable to (most?) old-school video adapters. Note that The bottom half of the screen is identical, but rotated 180 degrees. We only actually render the top half; on the bottom half, we have the video machinery scan the data out backwards. This has the effect of reversing it both horizontally and vertically.
Conway's Game of Life automaton at 800x600 using some very aggressively optimized Rust code. (In case you haven't implemented Game of Life, this version is a lot faster than most.)
Conway's Game of Life requires that we keep the entire "playing field" state in memory, and moreover requires that we keep two copies of it, since we compute a new state from the old state.
By packing the state so that we use one bit per cell, we can fit two copies of the field in 120,000 bytes — leaving us 11,072 bytes for things like the stack and interrupt handlers.
Updating at 60 frames per second means we have just over 5.5 CPU cycles per cell update — and that's assuming that generating video uses no CPU, which is of course not true. I used a 32-way bit-parallel implementation of the transition function that you can read here.
We feed the bit-packed playfield to the rasterizer directly.
m4vga has a
ridiculously efficient 1bpp rasterizer included (and an emulation
of that for WebAssembly).
Another old-school effect, showing a textured plane with smooth rotation and scaling.
tunnel above, this uses horizontal and vertical pixel doubling, so the
effective resolution is dropped to 400x300.
The rotozoomer itself is a traditional implementation of rotated/scaled texture scan conversion. (I haven't found a good tutorial online, or I'd link to one.) It exploits the fact that the Cortex-M4 has a fairly fast single-precision floating point unit, so we can do all the math in floating point and avoid having to implement fixed-point.
The interesting part is how the pixels get on the screen.
The texture scan conversion is not quite fast enough to be computed on the fly.
So, we need a framebuffer, so that we can start generating pixels during the
vertical blanking interval. The screen isn't radially symmetric like in
tunnel, so we actually need an entire 400x300 framebuffer to render the
A 400x300 8-bit-color framebuffer would take 120,000 bytes of RAM, which is most of our RAM. We cannot afford a second framebuffer to double-buffer.
So, instead, the renderer starts cranking out pixels during the vertical blanking interval, to "get ahead of" the "beam" (rasterizer). The rasterizer starts scanning out pixels at the end of vblank, and will gradually catch up with the renderer.
Normally, you have to be very careful doing things this way, because if the rasterizer catches up with the renderer, you'll get visual distortion (tearing). In the Rust implementation of this demo, that condition is checked and will cause a crash. (Death before distortion!) Despite the two processes racing each other, we maintain Rust's safety against data races.
Yeah, that's an engineering hack, not a performance hack, but it's still a nice hack.