I really like the STM32 series of microcontrollers in general. They’re generally
quite reliable, the peripherals are well tested, and more often than not I can
just grab one off the shelf and not think about it too much.
However, like every microcontroller, they do contain implementation bugs, so
it’s always important to read the “Errata Sheet” (or in ST’s language, “Device
Limitations”) when you’re using a part.
I appear to have hit an implementation bug in certain STM32 lines that is not
listed in the errata sheet. I can’t find any specific description of this bug on
the internet, so I’ve attempted to nail one down. Hopefully this will come up in
the search results for someone who hits this in the future and save them some
time.
I’m trying to do something kind of unusual with lilos: in addition to almost
all the APIs being safe-in-the-Rust sense, I’m also attempting to create an
entire system API that is cancel-safe. I’ve written a lot about Rust’s async
feature and its notion of cancellation recently, such as my suggestion for
reframing how we think about async/await.
My thoughts on this actually stem from my early work on lilos, where I started
beating the drum of cancel-safety back in 2020. My notion
of what it means to be cancel-safe has gotten more nuanced since then, and I’ve
recently made the latest batch of changes to try to help applications built on
lilos be more robust by default.
So, wanna nerd out about async API design and robustness? I know you do.
In this post I’ll work through an example of why I’m so excited about this
technique, by building a real driver for a notoriously tricky bus one piece at a
time, using lilos.
If this isn’t your first time visiting my blog, you may recall that I’ve spent
the past several years building an elaborate microcontroller graphics
demo using C++.
Over the past few months, I’ve been rewriting it — in Rust.
This is an interesting test case for Rust, because we’re very much in C/C++’s
home court here: the demo runs on the bare metal, without an operating system,
and is very sensitive to both CPU timing and memory usage.
The results so far? The Rust implementation is simpler, shorter (in lines of
code), faster, and smaller (in bytes of Flash) than my heavily-optimized C++
version — and because it’s almost entirely safe code, several types of
bugs that I fought regularly, such as race conditions and dangling pointers, are
now caught by the compiler.
It’s fantastic. Read on for my notes on the process.
This is a position paper that I originally circulated inside the firmware
community at X. I’ve gotten requests for a public link, so I’ve cleaned it up
and posted it here. This is, obviously, my personal opinion. Please read the
whole thing before sending me angry emails.
tl;dr: C/C++ have enough design flaws, and the alternative tools are in good
enough shape, that I do not recommend using C/C++ for new development except in
extenuating circumstances. In situations where you actually need the power of
C/C++, use Rust instead. In other situations, you shouldn’t have been using
C/C++ anyway — use nearly anything else.
This post is the fourth in a series looking at the
design and implementation of my Glitch demo and the
m4vgalib code that powers it.
In part three we took a deep dive into the STM32F407’s internal architecture,
and looked at how to sustain the high-bandwidth flow that we set up in part
two.
Great, so we have pixels streaming from RAM at a predictable rate — but we
don’t have enough RAM to hold an entire frame’s worth of 8-bit pixels! What to
do?
Why, we generate the pixels as they’re needed, of course! But that’s easier
said than done: generate them how, and from what?
In this article, I’ll take a look at m4vgalib’s answer to these questions:
the rasterizer.
This post is the third in a series looking at the
design and implementation of my Glitch demo and the
m4vgalib code that powers it.
In part two, I showed a fast way to push pixels out of an STM32F407 by getting
the DMA controller to run at top speed. I described the mode as follows:
It just runs full-tilt, restricted only by the speed of the “memory” [or
memory-mapped peripheral] at either side…
But there’s a weakness in this approach, which can introduce jitter and hurt your video quality. I hinted at it in a footnote:
…and traffic on the AHB matrix, which is very important — I’ll come back
to this.
Quite a bit of m4vgalib’s design is dedicated to coordinating matrix traffic,
while imposing few restrictions on the application. In this article, with a
minimum of movie puns, I’ll explain what that that means and how I achieved it.
This post is the second in a series looking at the
design and implementation of my Glitch demo and the
m4vgalib code that powers it.
Updated 2015-06-10: clarifications from reader feedback.
For the first technical part in the series, I’d like to start from the very
end: getting the finished pixels out of the microprocessor and off to a display.
Why start from the end? Because it’s where I started in my initial experiments,
and because my decisions here had significant effects on the shape of the rest
of the system.
I love the ARM Cortex-M series of microcontrollers. The sheer computational
power they pack into a teensy, low-power package is almost embarrassing.
But, many Cortex-M parts are small — 4x4 millimeters small — and don’t have
the pins left over for JTAG. For these parts, ARM introduced a new debug
interface, called SWD.
Unfortunately, SWD isn’t well-supported by open-source tools. Support is in
progress in most of them — including my personal favorite, OpenOCD — but
I’ve had bad luck so far.
Anton Staaf was having the same issue, and decided to do something about it.
He tricked the cheap, commonly-available FTDI FT232H chip into speaking the
line-level SWD protocol. We’ve teamed up and, a week or so later, have
something to show for it.
I did this because it was an immense technical challenge. Read on for details,
including links to a series of blog posts I wrote examining the code in detail.
Now that Hubris has gotten some attention, people sometimes ask me if my
personal projects are powered by Hubris.
The answer is: no, in general, they are not. My personal projects use my other
operating system, lilos, which predates Hubris and takes a fundamentally
different approach. It has dramatically lower resource requirements and allows
more styles of concurrency.