The case for firmwareplatforms

Firmwareplatforms are all the rage in embedded development. If done well, they provide means to shorten time to market for new devices and vastly reduce maintenance efforts. These two facts alone have convinced a lot of companies to develop their own platforms. In this new series I’ll explore some aspects of firmwareplatforms, how they interact and also, how a platform should be designed. The latter point is obviously hindsight – I surely don’t know better than the people who originally designed the platforms I’ve worked with in the past years (well I do hope that I know better now, than when I designed the platforms I was responsible for), but I do have the advantage of looking at a finished product and being able to spot the design flaws. Also note, that I’ll talk from purely architectural points of view, and will – in many cases – not incorporate much, if any, domainspecific elements. I’m aware that the application domain will always be present in the design, in fact, a design that does not acknowledge the application domain is – IMHO – pretty poor.

At last: I will concentrate on bare-metal systems, because they pose a tougher challenge with regards to platform design as opposed to something like embedded Linux, the latter coming with lots of facilities that already make the lives of developers a lot easier.

Why would we want a platform?

As stated in the introduction, firmwarplatforms still are a hot topic, but what are the compelling reasons for companies to choose building one? Obviously there are a number of reasons to do this, but let’s have a look at the most important ones:

Time to market: Once a platform is established, time to market for a new product utilizing the platform can be significantly reduced. On a recent example we went from board bring-up to production with a full-featured application in less than six months. This was only possible because the used plattform was well designed with portability in mind. This example used a completely new design. When we ported a design that was similar to a device already on the platform, that port was usually a matter of a couple of weeks for a single person.
QA: Having a platform allows us to save a lot of time during QA, since for the most part, we can focus on the differences between devices that utilize the platform, while testing most of the functionality only on one target. If a feature works on target A we can be reasonably sure, that it works on target B as well. This comes with a caveat: If the platform happens to be poorly designed and have limited testability it might actually backfire in QA, as each change a developer makes, will usually ripple through most if all targets. Poor testability will result in having to test all these targets for each feature.
Maintenance: While the overall cost of a feature tends to be a little higher during implementation when using a platform (the added abstractions impose a cost!), we only have to implement a feature once to have it available for all targets. The same goes for bugfixes (the downside here is, that a bug will usually haunt the whole platform and thus affect all targets).

Genesis of platforms

What makes a platform different from other kinds (or shall we say “regular”) of firmware projects? To figure this out, let’s take a look at different types of firmwares, and how the businessknowledge is usually preserved when going to a new generation of devices:

Straight program

This used to be very common – we have a single program, that contains all driversoftware to access the targethardware. These programs usually are very hardwarespecific with hardwaredetails leaking into the application logic, making a change to the hardware quite hard. Transition to a newer generation of hardware usually involves significant porting efforts, that may very well rival those of a complete rewrite. Testability is often a huge issue in these projects, as the tight coupling to the hardware prevents fast testautomation (e.g. Unittests) and forces the companies to use manual tests or highly expensive testequipment.

“Protoplatforms”

In companies that have productlines, which have sort-of the same features, but different hardwareimplementations we often find this kind of software. Looking at the sourcecode, we’ll usually find some kind of abstraction of the hardware, so hardwarespecific parts don’t leak into the application logic. However, the applicationlogic is often very tightly coupled and does not permit easy additions for different products (as that is usually not needed, since the featureset is more or less locked down). Testabilitywise we often find similar problems to straight programs, as protoplatforms usually have not been designed with testability in mind or have indeed evolved from straight programs. Transitioning these platforms to support a newer generation of hardware usually poses a lot less risk and effort than straight programs and is usually done by the dreaded “press F5 until it finally compiles” approach.

Generation 1 Platforms

The border between what I dubbed protoplatforms and the first real platforms is somewhat murky, but one of the most important aspects surely is some form of simulation, that enables a QA department to test most of the businesslogic of the device without actually needing a device. In these cases we can have a lot of testautomation with reasonable effort (i.e.: Since the simulation will usually have some kind of UI testautomation can be done the way UI tests are usually done – not exactly brilliant, but still great with respect to “bang for the buck”, at least as far as embedded software goes). Gen 1 platforms can evolve from protoplatforms or are pieced together from bits of code that originates from protoplatforms. As such, porting more or less requires the same effort as porting a protoplatform.

Generation 2 Platforms

This is, where things get interesting. The first Gen 2 platform I encountered was actually one I was one of the lead designers of. We learned a lot from a Gen 1 platform that was developed by a neighbouring department, with the added hindsight benefit of knowing the weaknesses their platform had (note that their platform was still a tremendous success compared to what was before the platform).

One thing that is a huge difference to lower tier software compared to a Gen 2 platform is, that a Gen 2 platform is designed with automated tests in mind, especially for as large a coverage of unittests as possible. Ideally we’ll have a coverage of businesslogic code of > 80 % (and strive for 100%). This design choice means, that we will usually not be able to evolve a Gen 1 platform (or earlier!) into a Gen 2 platform, as designing for testability forces the developer write very different code and to use other designpatterns. In the end a Gen 2 platform will in most cases be a greenfield project, where parts of the businesslogic are selectively ported from earlier code (where it makes sense). A common approach actually is to use previous versions of the code as cheatsheet for reimplementation.

Gen 2 platforms will usually have lots of unittests, that can run on a developer machine and the architecture will allow the developer to implement most features within testcases, without ever deploying onto the target. This can save some costs, as fewer targetboards are required during development, however these savings will realistically be offset by the initial invest that has to made to get the platform off the ground.

Some Gen 2 platforms also have simulation targets that allow the same kind of tests as the ones that are possible in Gen 1 platforms.

Since Gen 2 platforms are designed with portability in mind adding a new device to the platform usually only involves implementing the target specific drivers for that device. If the CPU manufacturer provides a mature BSP this will be a matter of writing adapters to interface the platform with the BSP’s code, while never touching the actual application code.

Generation 3 Platforms

One could argue, that a Gen 2 platform is all that is needed, since we’ll be able to create an extremely productive environment to develop our software here. While that is true, having worked mainly with Gen 1 and Gen 2 platforms in recent years, I have some gripes with these types of platforms, that – in my opinion – make the case for a new generation of platform that improves upon the designs of Gen 2. These are:

Even stronger modularity: Ideally we’d be able to create a device just by picking function blocks from the platform and compiling them. While there are some approaches here, very often there are integration headaches involved, that should not be necessary.
Design with security in mind: We have to deal with the IoT, and just slapping TLS on each ethernet connection will not cut it in the long run. For most lower tier platforms security is an afterthought, which does cause problems if we want to truly secure a device.
Improved diagnostics: A lot of bare-metal systems lack a robust way of error diagnostics – if something goes wrong we often need a debugprobe. As long as we’re talking about stand-alone devices, that never connect to the outside world this situation will not change much, since I’d assume that most of these devices have some kind of log file that would allow some kind of diagnostics. In a connected world things are different and we should strive for easy, robust diagnostics.
DevOps/continuos deployment capability: Yes, this is a buzzword, however it is actually a good thing to be able to get new features to our customers as fast as possible. Since lower tier platforms (even Gen 2) will often still need a significant regression test before each release, something like a monthly (or even more frequent!) release is just not feasible. With the (I)IoT not going away anytime soon, being able to deploy to a lot of devices on short notice is a robust advantage each business should seek. Only this capability will allow us to fix security issues fast and to react to customer requirements faster than our competitors.

Now, these bulletpoints are more than just a fancy new architecture. To really archieve a Gen 3 platform a lot of supporting processes need to be in place and we need a very tight integration between product management, developers and QA. This integration is probably the toughest challenge as it requires an organisational change to happen, while lower tier platforms can be driven by the engineering alone. To create a sustainable Gen 3 platform we need a truly agile organisation from PM to QA and the buy-in from upper management to support that transition. We also need strong technical leaders who are capable of creating a coherent vision about the whole development process from feature-whish to delivery. The technical chops needed are not all that different from a well designed Gen 2 platform, although I’d argue, that the design of a complete feature pipeline, that optimizes for deliverytime while maintaining a high product qualitity is definitely a challenge that few developers have faced until now.

So, you probably noticed – creating a Gen 3 platform is equally as much an organizational problem than it is a technical challenge.

What’s next?

So we’ve established a way of reasoning on platform generations. And while these five generations are a simplification, and we should bear in mind, that in the real world we have a continuum rather than five distinct classes, these classes should help us when talking about design aspects in the next parts of the series, especially with respect to a Gen 3 platform. Stay tuned for these articles, we’ll have a look at:

Preconditions,
Scalability, Degrees of Freedom
Requirements
Infrastructure
Surrounding systems
Security
Operating Systems
Architecture
Connectivity & User Interfaces
Diagnostics
Safety
Testing
Portability
Device Updates
Design for production

At some point I’ll also have some articles talking about the organizational side of things, but as of now, the topics are not finalized, so I’m not ready to talk about that yet.

Image by Nicolas Thomas

Posts