Bit-Banging Microcontroller Graphics

You can now view these demos in your browser!

m4vga is a technique/library for hacking the STM32F407 to generate high-quality analog color video signals with just a handful of resistors.

I wrote the C++ version between 2012 and 2015, and rewrote it in Rust in 2019 to put my money where my mouth is.

I did this because it was an immense technical challenge. Read on for details, including links to a series of blog posts I wrote examining the code in detail.

What? Why?

I always thought the graphics demo effects I saw in the 80s and 90s were impressive, but I was born a bit too late for that era. By the time I started doing graphics programming, the field had become less about exploiting hardware quirks and counting CPU cycles, and more about checking GPU driver versions.

So, to study the wizardy of that golden age, I decided to write some demos that run on a very small computer without an operating system: a microcontroller.

Why is this interesting/hard?

Because I have to do everything myself with not enough resources.

The microcontroller I chose has no video support. How do we write graphics demos if the computer doesn’t support video? We hack it into doing something its designers never intended. It’s like playing music on a floppy drive, except that the signals are millions of times faster.

So, I decided to bit-bang the video through a parallel port.

Video is incredibly touchy, because the human visual system is really, really picky. We can sense tiny distortions, blurriness, and noise in an image. In fact, the video output is visibly corrupted if critical parts of my code get delayed by ten nanoseconds, or just over one CPU cycle. In a test run of techniques that I later applied at my day job at the time, I built a hard-real-time scheduling environment that maintains timing, so that the demo code itself rarely has to worry about it.

I chose to generate fairly high-resolution, high-bandwidth video for the speed of CPU I was using: 800x600 and 60 frames per second. This means that I have four CPU cycles per pixel.

Speaking of resource constraints: at 8-bit color, a single 800x600 frame would take 469kiB of RAM to store. I have only 128kiB, and I have to use some of that for things like the stack. So single-buffering images is right out, to say nothing of double-buffering. To maintain 60fps, I have no choice but to generate video on the fly.

Which is more fun, anyway.

Under the hood

In 2015, I wrote a series of blog posts explaining the techniques I used in detail. If you’re curious how this all works, here you go:

  1. Introducing Glitch
  2. Pushing Pixels
  3. A Glitch in the Matrix
  4. Racing the Beam