Mini-Mapper 15: A better software setup

After doing some initial software experiments (reported in Episode 13), I did some thinking, and decided on some improvements to the approach I was taking. I’ve now rewritten all the software I had to take account of this new approach, and have written a proof-of-concept demonstration that has enough going on to convince me that I’m on the right track. Before I get into the details of all this, here’s a short video showing the demonstration code:

What’s a better way?

The main principles I’m following, described in Episode 13, remain the same: use CMSIS headers only as the external API (i.e. no HAL library or anything like that), use modern C++ (though no exceptions, RTTI or dynamic memory), and write a “DIY HAL”, which abstracts from hardware-specific drivers to application-level entities without many layers in between, i.e. write drivers that use CMSIS memory-mapped peripherals directly, but present abstractions that make sense at an application level (e.g. PWM driver + GPIO output driver + GPIO interrupt input driver + timer driver + ADC driver + glue code = motor driver with “forward/backward X%”, “stop”, “getSpeed”, “getTorque” API).

The biggest changes I decided to make were to make everything testable. It’s not uncommon for testing of embedded software to mean single-stepping things in the debugger to confirm that they work the way they’re expected to. That’s not repeatable or scalable. Some sort of automated testing is needed.

You can do your automated testing on the platform you’re developing for, but it’s not all that convenient: you need to flash your test code to the platform every time you make a change, and embedded platforms are usually constrained in terms of memory so you may need to split your test suite up into multiple programs to be able to test everything. That’s kind of painful, and makes the edit-compile-test cycle too long for doing any sort of test-driven development.

A better approach is to test on Linux: this makes TDD easy, it means no flashing to the embedded platform, and it gives you an easier debug environment too.

That’s easy to say, but how to make it work? First, you need to be able to build all your code on both the embedded platform and on Linux, with the appropriate compilers and appropriate compiler flags. Second, you need the builds on Linux and the embedded platform to be different: the Linux side needs to test the library code, while the embedded side needs to use the library code. Third, the Linux side obviously doesn’t have the hardware from the embedded side, so there needs to be some sort of “mocking of the hardware”.

Let’s talk about the tools needed to make this work.

Tools

Makefiles

The makefiles used for the build system are split into a few different parts (... is the top-level code directory):

.../Makefile: top-level Makefile–this builds the test executable on Linux (and has a goal to run it). Most of the build setup comes from the files in the mk sub-directory.
.../mk/linux.mk: Linux-side build definitions–these are mostly concerned with ensuring that tests get built and the hardware mocking code is included.
.../mk/cross.mk: cross-compilation build setup to build code on STM32–this is the usual sort of setup of the arm-none-eabi GCC toolchain.
.../mk/common.mk: this file includes most of the actual build rules, and is shared between the Linux and embedded side.
.../apps/Makefile.common: common build setup shared between all STM32-side applications–among other things, this ensures that hardware setup code is included.
.../apps/.../Makefile: build setup for individual applications–these Makefiles are minimal, and mostly rely on the common application setup in .../apps/Makefile.common.

The result of this setup is that you can run make test in the top-level code directory to build all the library and test code on Linux and run all tests, and you can run make in any of the application directories to build the STM32 code, and then make flash to flash it using OpenOCD.

Test framework: doctest

For testing, I’m using the doctest framework. This is a C++ 17 header file only framework, which means there’s basically zero hassle with installation (just copy the doctest.h header file into a directory that’s included in the search path when building tests).

The workflow with doctest is to include test cases into the source file defining the code being tested. This is in contrast to the usual approach where you have separate test files. I wasn’t so sure about this approach to start with, but it works well.

The other thing that works well with doctest is that it uses C++ expression templates for writing assertions. That means that you don’t need to remember much of an API at all for most cases: there are TEST_CASE and SUBCASE macros for setting up test cases, and REQUIRE and CHECK macros for writing assertions, and that’s more or less all you need most of the time.

To be honest, there’s not all that much to say about doctest: it’s very good and it just works! Here’s a very simple example of what some tests look like:

TEST_CASE("Command shell") {
  ...

  SUBCASE("variable value parsing works (boolean)") {
    bool chk;
    CHECK((Shell::parse("true", chk) && chk));
    CHECK((Shell::parse("false", chk) && !chk));
    CHECK(!Shell::parse("wrong", chk));
  }

  ...
}

Mocking framework: trompeloeil

To go with the doctest test framework, I’m using a mocking framework called trompeloeil. This is another “modern C++” header file only framework (so again, no installation hassle at all).

This again, like doctest, is very good, and it mostly just works. It is a little more difficult to get working, and you get some fearsome compiler errors when you do something wrong, but after a little bit of experimentation, I’m very happy with it.

Here’s an example of what defining a mock class looks like:

class MockShellModule : public trompeloeil::mock_interface<Shell::Module> {
public:
  MockShellModule() :
    trompeloeil::mock_interface<Shell::Module>("(SHELL-MODULE)") { }

  IMPLEMENT_MOCK2(set_variable);
  IMPLEMENT_MOCK1(show_variable);
  IMPLEMENT_MOCK3(run_command);
};

Here, Shell::Module is an abstract class that defines a command shell module (basically a thing that implements some commands and variables that can be managed through a text-based terminal connected to a USART). The IMPLEMENT_MOCKx macros are from trompeloeil, and define mock versions of the given methods. A test can then require that a particular method of a mock object is called using the REQUIRE_CALL macro:

REQUIRE_CALL(module, run_command(_, 0, _))
  .WITH(strcmp(_1, "test") == 0)
  .RETURN(Shell::COMMAND_OK);

The trompeloeil framework has facilities for argument matching, controlling the return values from mock calls, and other things. It covers more or less any mocking functionality you might like, and it works without drama.

Using trompeloeil means that mocking out any code I write myself is easy: if I want to test a “higher-level” component that depends on a “lower-level” component, I just write a mock for the lower-level component and use that in the test of the higher-level component. But what about testing the lowest level code, i.e. the drivers that directly manipulate the STM32’s hardware resources? This is trickier, and needs to be done so that you can run low-level tests on Linux!

GPIO pin abstraction (and hardware mocking)

To understand how low-level code can be tested and mocked, let’s look at the lowest-level part of the library code I’ve written. This is an abstraction of a GPIO pin, using a Pin class, which is basically just a wrapper around a pointer to a GPIO port (as a CMSIS GPIO_TypeDef pointer) and the pin number.

The Pin class (header file and source file) defines methods to set up pins as outputs, inputs and alternate functions, and to set, reset and toggle outputs. This makes it possible to write code like the following, taken from a test case:

PA6.output(GPIO_SPEED_VERY_HIGH, GPIO_TYPE_OPEN_DRAIN, GPIO_PUPD_PULL_UP);
PA6.set();
CHECK((READ_BIT(GPIOA->ODR, 1 << 6) >> 6) == 0x01);
PA6.reset();
CHECK((READ_BIT(GPIOA->ODR, 1 << 6) >> 6) == 0x00);
PA6.toggle();
CHECK((READ_BIT(GPIOA->ODR, 1 << 6) >> 6) == 0x01);
PA6.toggle();
CHECK((READ_BIT(GPIOA->ODR, 1 << 6) >> 6) == 0x00);

It’s clear how methods like Pin::output and Pin::set should work on the STM32, where the hardware is there, and the CMSIS header files define exactly where in the STM32’s memory map things like GPIOA should be. But what about on the Linux side? How can you test these low-level hardware-dependent drivers without the real hardware?

The obvious answer is that you mock the hardware. In the same way that you replace code that you write with a mock version over which you have complete control for testing (in my case, using trompeloeil), you can replace or modify the CMSIS definitions to put all of the CMSIS objects (GPIOA, TIM1, RTC, RCC, ADC, and so on, basically all of the things defined in the stm32f767xx.h header file) in reasonable places in the normal Linux process memory space of your test program.

This requires just a little massaging of the CMSIS header files to split out the things that refer to explicit addresses on the STM32 and replace them on Linux with something that’s laid out similarly but just lives in “normal” memory. All the STM32 peripherals are memory-mapped, so this approach works really well. You can even lay things out so that the same address calculations you might do on the STM32 work in your Linux test code. For example, the addresses of each of the consecutive GPIOx structures defining the basic GPIO pin attributes for each GPIO port are separated by 0x0400 on the STM32, and setting things up to be the same on Linux makes it possible easily to determine the relevant addresses from an integer port index. In my case, this is done just be defining one big byte array to represent the whole of the AHB1 peripherals, and pointing the AHB1PERIPH_BASE variable at the beginning of that array. The standard STM32 CMSIS headers define all the GPIOx addresses in terms of AHB1PERIPH_BASE, so this works a treat!

The end result of this is that you have memory structures that accurately reflect what your application code on the STM32 will see. Obviously, the autonomous behaviour of peripherals that reference those memory structures isn’t there, and you have to update these structures explicitly in your test code, but the memory structures are correct. As an example, in tests for the USART class, the USART read data register and byte receive interrupt bits must be set explicitly to signal to the code that a byte has been received, and then the interrupt service routine for the USART C++ object must be called explicitly:

USART3->RDR = 'x';
SET_BIT(USART3->ISR, USART_ISR_RXNE);
usart.rx_irq();

This turns out not to be any kind of obstacle, because it allows you to write tests that make it very clear what sort of hardware events are expected in different situations, and allows fine control over the sequence of those events so that you can test different scenarios easily (particularly error scenarios).

Application code example: command shell

Now let’s move from the lowest-level driver code to the current highest-level code that I’ve written. This is a command shell that allows you to write “shell modules” to process commands. The shell uses a line-oriented terminal for communication with the user, and the terminal uses a USART for communication (on the Nucleo development board that I’m using. USART3 is connected to the ST-Link debugger, and appears as a virtual serial port on the PC connected to the debugger).

As an example of how this all works, the main program for the shell-pwm application can be seen here. This has some standard initialisation, i.e. switching on caches, setting up the CPU and SysTick clocks, switching the terminal to interactive mode (which gives some rudimentary line editing capability) and initialising a PWM driver, which is followed by a few lines that create a command shell and add a couple of shell command modules to provide simple LED blinking and a PWM output. The LED blinky module provides some commands to control the LEDs on the Nucleo board, with commands looking like this:

> led 1 on
> led 2 blink
> set led-2-delay 50
> led 3 blink
> set led-3-delay 100

The PWM module provides commands to control a PWM driver that drives a single GPIO output, with commands like this:

> pwm on
> set pwm-duty 25%
> set pwm-polarity neg

Once the command shell is set up, all of the software components that may need to process events are added to the event manager. This event manager implements a cooperative multi-tasking approach, where software components operate almost exclusively on an event-driven basis, and where interrupt service routines for hardware resources mostly just post events for later processing (for example, these ISRs for handling character receive and transmit DMA complete interrupts for a USART). The events posted by ISRs are handled in “bottom half” handlers that are driven from the event loop, ensuring that ISRs complete as promptly as possible.

In the command shell, the terminal to which the shell is connected generates TERMINAL_LINE_RECEIVED events whenever a complete line is received terminated by a carriage return. The whole line is made available to the shell for subsequeny processing and dispatching to the relevant shell module. The lower-level details of communicating with the USART driver are encapsulated in the Terminal class.

When a character is received at the USART connected to the terminal, the flow of control is:

USART ISR is called.
The USART ISR calls the USART::rx_irq method. If there is no error, this posts a USART_RX_CHAR event to the event manager and returns immediately.
Later, when the event manager event loop is run, the USART_RX_CHAR event is dispatched to the dispatch method of the Terminal class (all classes that want to consume events are derived from the Events::Consumer abstract class, which includes a pure virtual dispatch method). This uses the Terminal::process_rx_char method to process the incoming character, and if the character is a carriage return, a TERMINAL_LINE_RECEIVED event is posted.
At some later timer, the event manager dispatches the TERMINAL_LINE_RECEIVED event, and the CommandShell::dispatch method processes the command line (by sending it to the appropriate shell module), then signals that processing is complete by posting a TERMINAL_LINE_PROCESSED event (which allows the terminal to reclaim the command buffer and to prompt for the next command line from the user).

The main goals of this approach are to:

Keep ISRs short and fast.
Provide a simple and adaptable mechanism for different software components to interact without the complexity of a full RTOS.

So far this event-driven approach seems to work quite well. It’s almost certain that I’ll have to switch to using an RTOS later on, but for now, this is pretty good.

Conclusions

So far, my experience with both the “modern C++” approach and the tools I’m using is pretty positive.

C++ is a good match for the kind of abstractions that make sense for embedded programming, and “modern C++” has some facilities that make test and mocking tools easy to work with.
“Mocking the hardware” works well for doing unit testing on Linux. I don’t know how widespread this approach is, but I’d definitely recommend it!
I don’t know how this would be with other build systems, but using simple Makefiles, it’s reasonably easy to share most of the build system code between the Linux and STM32 builds.
This idea of quickly going from low-level drivers that interact directly with the hardware to “application-level” code so far seems to be working quite well.