I recently posted about my debugger for async Rust, which can
generate what I call “await-traces” for async code that’s suspended and not
currently running. I mentioned at the time that it appeared possible to get the
source code file name and line number corresponding to the await points, but
left that for future work.
I’m a big fan of Rust’s async feature, which lets you write explicit state
machines like straight-line code. One of the operating systems I maintain,
lilos, is almost entirely based on async, and I think it’s a killer
feature for embedded development.
async is also popular when writing webservers and other network services. My
colleagues at Oxide use it quite a bit. Watching them work has underscored one
of the current issues with async, however: the debugging story is not great.
In particular, answering the question “why isn’t my program currently doing
anything” is very hard.
I’ve been quietly tinkering on some tools to improve the situation since 2021,
and I’ve recently released a prototype debugger for lilos: lildb. lildb
can print await traces for uninstrumented lilos programs, which are like
stack traces, but for suspended futures. I wrote this to help me debug my own
programs, but I’m publishing it to try and move the discussion on async
debugging forward. To that end, this post will walk through what it does, how it
derives the information it uses, and areas where we could improve things.
Now that Hubris has gotten some attention, people sometimes ask me if my
personal projects are powered by Hubris.
The answer is: no, in general, they are not. My personal projects use my other
operating system, lilos, which predates Hubris and takes a fundamentally
different approach. It has dramatically lower resource requirements and allows
more styles of concurrency.