Me wearing ridiculous goggles

Cobble Core Concepts

Cobble was initially designed to build programs written in C-like languages, primarily C++. This is because:

  1. I use C++ a lot for embedded work, where (for speed or resource reasons) I can’t afford to use something like Haskell or Java.

  2. Other languages (like Haskell and Java) already have pretty nice build systems, or don’t require them in the traditional sense (e.g. Python).

  3. C-like languages are surprisingly hard to build correctly, because the tools (particularly the linker) are obsessively single-pass.

There are many build systems that work with C, including one of the earliest such programs devised, make. I’ve worked with most. They all have their problems for large applications, particularly applications that may be built for a variety of SoC variants or operating systems, or applications built out of large collections of reusable components (e.g. git submodules).

The problem is this: in a multi-architecture environment, builds need to be parameterized. In the simplest cases, this might mean building a library for both ARM and x86, or Mac and Windows. In complex embedded systems, though, the configuration space quickly becomes combinatoric. It’s not unusual to have to build for:

  • ARMv4 (32-bit ISA), ARMv7A (Thumb-2),
  • SoC implementations from ST, NXP, and Atmel,
  • With and without floating point,
  • With varying amounts of internal SRAM or externally attached SDRAM,
  • Running on bare metal, FreeRTOS, and ChibiOS,
  • Booting from Flash, NAND, eMMC, or SD,
  • With clock speeds from 1MHz to 200MHz,
  • Oh, and of course the code must also build on your PC for unit testing.

I want to write a library once, write its build configuration once, and just describe the ways it needs to vary to support different platforms (if it must vary at all). This includes changes to which source files get built, which dependencies it pulls in, which compiler gets used, which flags are set, etc.

The key simplification is one Anton Staaf found years ago in his personal build system: build configurations are really templates that need to be invoked with various parameters, and they need to be able to change shape for different parameter values.

This is the core idea behind Cobble. I’ve spent some time refining the precise semantics, so that everything is predictable and extensible.

Core Philosphy

Cobble is opinionated.

  • Software should be modular, and so should its build files. I don’t want to write a hundred implementations of a thread-safe circular buffer — I want to write a reusable component that I can apply to many problems. When I reuse it, I want to reuse its build configuration too. Don’t repeat yourself, after all.

  • Defaults are bad. Oh, the time I’ve lost to fighting with make, only to discover that some default pattern rule had been winning out over my hand-crafted build instructions. What we actually want is to describe things simply without repetition; defaults are the wrong way of achieving this.

  • Build systems should be simple. Cobble applies a single core concept in different ways to produce a flexible system. Its core is about 800 lines of Python, plus about 200 of C-specific support.

  • Declarative build languages are a false goal. What we actually want are declarative build systems that can expose a hermetic DAG to analysis, without environmental dependencies. Banning for loops and if statements is neither necessary nor sufficient to achieve this.

  • If it ain’t broke, don’t fix it. Need a simple scripting language for expressing conditions in build files? Don’t write your own when Python, Lua, JavaScript, etc. exist and are well-understood.

  • Semantics are important. Cobble’s internal structures and operations can be formally described and analyzed. In particular, the flow of information between build targets is tightly controlled and carefully engineered. This makes it much easier to convince one’s self that the build is both correct and deterministic.

Projects

Cobble works on projects. A project is simply a directory containing a BUILD.conf file. It’s typically the root of the source tree.

The BUILD.conf file is responsible for:

  • Installing plugins. You’ll need to install at least one.
  • Declaring environments (see below).
  • Seeding DAG traversal.

To expand on that last one: Cobble needs to know where your sources live. Cobble will not go recursing through your filesystem. Instead, you must name at least one node in your build DAG (i.e. at least one directory in your source tree). Cobble will start there and spider out to load all dependencies.

If you’re used to build systems automatically recursing through your source tree, this might feel clunky — but it’s a feature. It means at least one file visible to Cobble must change for the overall shape of your project to change. This means Cobble will know to regenerate its state, and ensures that you can always run an incremental build safely.

Packages

Inside the project are packages. A package is Cobble’s unit of modularity. It’s simply a directory containing a BUILD file.

BUILD files define targets (below). These targets can be referenced from other packages, using a special syntax:

//path/to/package:target_name

You’ll see this syntax a lot — it’s how Cobble describes dependencies between reusable components.

Targets

The fundamental unit of a Cobble build is the target. A target is something like a library, executable binary, or set of generated files. Targets in Cobble are higher level than (say) make targets; in Cobble, one doesn’t write individual targets for each object file in a program, for example. One just describes the program itself.

Cobble figures out the lower-level steps that need to be taken using a production model. Cobble ships with the a production model for C-like languages because they’re common and surprisingly difficult to get right.

Cobble is extensible: projects can define their own target types.

One can think of a target as a factory for build products — object files, static archives, executables. The operation of the factory is controlled by the environment.

Environments

An environment, in Cobble, is just a key-value store. It’s similar to a dictionary or hashtable.

An environment might contain keys telling which C compiler to use, which flags to pass, or the name of the target operating system.

Notice I said “an environment.” There is no single environment. Each target chooses the environment used to build its dependencies, and can also customize the environment used to e.g. compile its source code.

(SCons users: the Cobble environment is very similar to SCons’s Environment, but it’s different in a few key ways. Keep reading without too many preconceptions.)

Environments are immutable. It’s possible to derive a new environment with some changes, but not to change an existing one. This means that side effects from targets can only “leak out” in well-defined ways.

Since there is no default environment, and since every target is evaluated in some environment or other, one might wonder how the whole process gets bootstrapped. Is it turtles all the way down? The answer lies in the concept of leaf targets.

Leaf Targets

Some targets are considered leaf targets. These are targets that can be built (with their dependencies) in isolation. For example, C programs are treated as leaf targets.

A leaf target is in full control of the environment seen by its dependencies. The project’s BUILD.conf file declares a set of leaf environments, each of which has a name; each leaf target must select an environment. For example, a particular firmware image might specify that it is to be built in the environment designed for a particular board.

If a single leaf (say, a program) needs to be built in several environments, it’s cheap to create multiple targets in Cobble. Because Cobble’s BUILD files are Python, one can use a for loop to iterate over a list of environment names and crank out targets for each, without having to copy-paste.

Environment Deltas

We can’t reasonably define every environment in BUILD.conf — because the environment contains all information needed to compile software (besides the source code). If a particular library requires -lpthread to appear in the link command line, for example, this is an environment change. We could put all the link flags in BUILD.conf, but this would kill modularity! Any program that needs that library should pick it up from its dependency graph, without needing changes to BUILD.conf.

To achieve this, targets can specify environment deltas. These are simply lists of changes to be applied to some environment. For example, a delta might specify…

  • That -O2 should get appended to the c_flags,
  • That the freertos_config_dir variable should be set to ./include,
  • Etc.

Out of the box, targets can specify three environment deltas.

  1. The extra delta is applied downward — it changes the environment seen by all dependencies. This delta can be useful for setting CPU architecture or adding -DNDEBUG to disable assertions.

  2. The local delta is applied to operations within the target, such as compiling its own source files.

  3. The using delta is propagated upward — if target A depends on target B, A works in an environment modified by B’s using delta. This is useful to add -I flags to access a library’s headers, or to add -lpthread to the linker command line.

Of course, plugins can add their own Target subclasses with more complex behavior if desired.

Let’s get a bit more concrete. In a normal C program, the deltas are used as follows:

  • The top-level program uses the extra delta to configure the compilation environment for its entire dependency graph. It might set up the default include paths and request optimizations.

  • The program and each library can customize compilation of its source files using its local delta. For example, a particular library might be performance-critical, and might add -O3 to the c_flags. Alternatively, one might enable non-public API by setting a -DINSIDE_LIBRARY define.

  • Each library uses the using delta to expose features to dependents. Typically this involves extending link_srcs to specify what objects to link into a program in what order.

It’s possible to create arbitrary deltas that do things like deleting keys, rewriting strings, etc. But this is unusual. Typically, BUILD files create very simple deltas out of a single operation: appending content to keys, creating the keys if they don’t exist. This operation is surprisingly general, and is equivalent to make’s += operator.

For more powerful operations manipulating deltas, look in the cobble.env module.

String Interpolation

The default delta mechanism allows for string interpolation. In essence, each delta can be customized by the environment to which it is being applied. Cobble uses Python’s standard string interpolation syntax for this: %(key)s.

For example, if a target is being built in an environment where an arch key is defined, it can generate paths based on arch like:

impl/%(arch)s/clock.cc

Now for the key: nearly every feature in Cobble is implemented by generating environment deltas. This means you can use string interpolation nearly everywhere:

  • In the sources list for a program or library.
  • In the deps list that names dependencies.
  • In c_flags or even cc (which specifies the compiler to use).

(Currently, the one place you can’t use interpolation is in choosing a target name. Target names must be the same in all environments.)

Build Directories

Cobble is designed to perform out of tree builds. This is a fancy way of saying that you build software in a separate directory. This has a couple of advantages:

  • You can blow away the build directory with impunity, for a guaranteed clean build.

  • Your source tree remains clean without a bunch of .gitignore patterns.

  • You can have separate directories where you build the same software in different configurations.

  • If your disk is slow, you can put your build directory in a RAM filesystem. If your computer reboots, you only lose the build output — not your sources.

You can create a build directory for an existing project using Cobble’s init subcommand. Cobble produces some files in the directory that reference the project and describe the build. From then on, you can use the build subcommand to make software.

Product Hashing and Work Stealing

As discussed above, a target is always evaluated within an environment.

As a side effect, it’s perfectly reasonable to evaluate a target within several environments — say, to build both ARM and Thumb variants of a library. You don’t have to define anything for this to happen; it’s controlled by the environments used to build your leaf targets.

It’s important that the build products — object files, static archives, and the like — not “leak” between environments. (For example, trying to link an ARM library into a firmware image for a Thumb-only processor will get you nothing but pain.) To keep outputs distinct, Cobble uses environment hashes. Basically, the key-value pairs in the environment are normalized and fed through SHA1 to produce a big scary hexadecimal number.

Build products end up in your build directory, in a subdirectory called env/big-scary-hex-number — so products from two environments will never collide.

This might seem like overkill. Couldn’t we just create a subdirectory per leaf target? The answer is both “no” and “you don’t want to do that.”

  • “No,” because Cobble is completely fine with a single leaf target (program) linking in a library built in two separate environments! Want to include both an ARM and Thumb version of a function, or produce a single image that supports multiple SoCs? Go for it.

  • “You don’t want to,” because Cobble actually creates hash collisions to keep from doing unnecessary work. If two programs wind up needing a product built in the same environment, it only gets compiled once.

That second bit is a feature called work stealing. Leaf targets will steal each other’s work whenever possible. In one of my codebases this reduces compile times by 80% compared to a naive “build separately for each leaf” approach.

Cobble uses a strategy to maximize work stealing, called environment minimization. Before producing a product, Cobble boils down its environment to only the keys used by that product. For example, when compiling a C object, only the cc and c_flags keys have any impact — even if you’ve set cxx, it doesn’t affect C builds.

It’s important to minimize the environment as much as possible — but no more! Removing a key that’s relevant to a product could cause targets to link in the wrong objects, resulting in an incorrect build. Fortunately, attempting to do this (e.g. by writing your own Target subclass with poor behavior) will get you a KeyError exception. Cobble has your back.

More Cliffle

By Topic