Me wearing ridiculous goggles


High-Quality Microcontroller Graphics

3D wireframe model with text scroller: demo/rook

3D wireframe model with text scroller: demo/rook

The STM32F407 is a microcontroller that has neither any video hardware, nor enough RAM to store a framebuffer at any reasonable resolution/depth.

So of course I’m using it to produce 800x600 60fps color graphics. It’s like a puzzle game, only much geekier.

m4vgalib is the core code behind my efforts, and is open-source in case anyone else thinks this sounds like a good time.

Source code, etc.

The sources, and some more documentation, live on GitHub. It’s split into two pieces:

Probably the best reference to m4vgalib’s source code is my series of articles on the internals of my Glitch demo, which is built on m4vgalib.

m4vgalib has no external dependencies beyond GCC and ETL.

I build m4vgalib and its demos using Cobble, but it should be possible to retrofit a different build system if you prefer.

High-Level Overview

m4vgalib is designed for applications that want to produce color graphics without the comfort and ease afforded by modern computers. ;-)

It is not a graphics engine per se — more of a software simulation of graphics hardware. It’s specifically designed to allow flexible user-defined video modes, similar to those on the Amiga and Atari 800:

  • Applications can fully configure the horizontal/vertical timing. (Battle-tested VESA 800x600 and 640x480 mode settings are included.)

  • Each scanline of the screen can be associated with a different rasterizer. The rasterizer decides things like color depth, format, and pixel clock, as well as whether effects like line-doubling are used. This means that almost everything about the video mode can easily be changed at each scanline.

  • A horizontal blanking interrupt is provided for applications that want to do even more stuff at each scanline, such as produce sound1.

A variety of rasterizers are included with m4vgalib:

  • Simple raster graphics using either direct color (8bpp) or a palette with 1, 4, or 8bpp.

  • Attributed text, with per-character foreground and background colors, user fonts, and smooth scrolling. (One boring font included.)

  • A false-color scalar field renderer that uses linear interpolation to draw a low-resolution equation in two dimensions — which just happens to be an easy way to implement the old-school plasma effect.

  • A cheap rasterizer for filling a scanline with a solid color.

Since rasterizers are just software, you can of course write your own. Several of the demos included in the m4vgalib demos package do just that, if you’d like to start from a worked example.

Because of the RAM limitations on the STM32F407, most high resolution effects wind up using some sort of clever rasterizer. The 3D filled polygons in my Glitch demo are displayed at 800x600 using a custom rasterizer that does polygon scan conversion on the fly.

Rasterizers are run in interrupt context concurrently with the application, so the application doesn’t have to be built in any particular way — i.e. you can write a main function with a rendering loop instead of having to integrate into a “framework” of some sort. It’s even possible to run m4vgalib alongside an RTOS2.

Highlights from the Demo Package

m4vgalib includes a bunch of working demos that can be flashed onto an STM32F4Discovery board. I’ve tried to comment them reasonably well, so you should read the sources for specific examples of how to apply m4vgalib.

Here are some of my favorites.




Runs Conway’s Game of Life, a famous automaton, over the entire surface of a 1-bit framebuffer at 60 frames per second.

Besides (in my opinion) looking pretty, this demo has some useful features.

First, it’s the simplest example among these demos of 1bpp 800x600 graphics with double-buffering/page-flipping. If you want to see how to do high-res two-color graphics with a computationally intense rendering algorithm, start here.

Second, it stresses the processor’s bus matrix more than the other demos. In terms of raw CPU load3, it’s on par with rook — but because conway reads back its last frame of video when generating the next, it winds up producing around 2.5x more bus traffic. This makes conway my preferred way to find bugs in bus timing and arbitration inside m4vgalib..

conway only puts two colors on-screen. If you want to jazz it up, you can always change the foreground/background colors at scanline boundaries, or hack the rasterizer to use a procedural palette like I did in conway’s first public debut back in 2012.




Displays text with separate foreground and background colors (256 potential colors on screen, though I’ve only wired up six bits in my system).

Okay, this one may look mundane, but it’s honestly one of my favorites. In m4vgalib, the rasterizer determines the framebuffer format — that’s one of the key design features. In this case, the framebuffer format is ASCII text plus color information, not pixels. The rasterizer combines this with a font table to convert to pixels on the fly.

Which is important, because maintaining all these pixels in RAM would take far more RAM than we have. Character graphics, like tile graphics, are a form of compression; in the end we can fit 14 screens of text in RAM.

Rendering the font on the fly does mean the rasterizer is relatively expensive, burning about 63% of CPU when applied to the entire frame. On the other hand, the application doesn’t have to do anything except poke characters into memory when desired. It’s a traditional time-space tradeoff.




Renders a wireframe chess piece (from my chess set) along with smooth-scrolling text.

This leans pretty heavily on the floating point unit. At the time I got this working, I had to cheat a little to get it stable at 30fps: there are bands of the screen drawn by the SolidColor rasterizer above and below the wireframe, because it has no CPU overhead and I needed all the CPU I could get. Subsequent optimizations in both m4vgalib and my wireframe code have made this unnecessary, even at 60fps, but the hack persists for now.

The chess piece is an STL file; it’s converted to an efficient representation for wireframe drawing by a script during the build.




Combines two rasterizers at different scanline positions and dynamically varies which scanlines are assigned to which.

This is the sort of Stupid Graphics Trick, in the tradition of ANTIC display lists or Amiga Copper effects, that m4vgalib was designed to facilitate. The frame loop maintains the “band list” that controls which rasterizers m4vgalib will use for each line of output, and messes with it during the vertical blanking interval to animate the bounds between the rasterizers.

The two rasterizers are the stock 10x16 text mode and a procedural one that draws an 800x600 full-color texture with a scroll effect.

  1. m4vgalib does not include sound-production code.

  2. The detail of running m4vgalib alongside an RTOS are terribly specific to the particular RTOS you’re into, and I probably don’t have time to help. For FreeRTOS users, here’s a hint: you need some way to distinguish between the use of PendSV as a rasterizer and its use as a context switch.

  3. conway takes about 6.9ms to render as of this writing, plus rasterizer overhead of 5.07ms, so the CPU is idle for 4.21ms of every frame. This means the automaton itself costs around 2.3 cycles per pixel, which is surprisingly low. This is thanks to optimizations devised by Mark Niemiec and others — I only wrote the code.

More Cliffle

By Topic