And yes, I actually mean bare metal as in “no operating system”. When I first got the idea to tinker with WASM it was due to my interest in Rust. Being an embedded software developer I naturally wanted WASM to run on an embedded platform. There already is a runtime that does this, but it is written in C, which I loathe and also has no testcoverage to speak of.
So, I set out to write a WASM runtime for Cortex M4 machines with as little as 96 KiB of RAM.
Constraints:
- I assume, that I will execute the majority of the code from flash to conserve RAM
- The runtime will not use any dynamically allocated memory, all memory shall be provided by the hosting app
- It will be written in Rust in a no_std environment
First steps: The darn format
One could think that a format as well supported as WASM should be very well specified – however, the documentation for the WASM binary format, while complete, is lackluster. The information needed to parse a file is littered in a host of webpages, which makes finding anything a pain in the butt. And even then, the documentation somewhat lacks in clarity, allthewhile being – IMHO – overly formal. It took me a good while to actually figure out the anatomy of a WASM file. But, right of the bat, another problem was so solve:
Allocations, allocations, allocations…
…are verboten. This is due to me working in a no_std environment, where – by default – no allocator is available. The hosting app is supposed to supply me with all the memory I need, that means:
- A bunch of bytes to store my state.
- The RAM available for the WASM program.
- Some place to store the interpreter stack
The former is pretty simple, I’ll just accept a pointer to a static array. The latter two however pose a problem, as I will usually not exactly know how much memory I need, as this will actually depend on the WASM file in question. For now, I resort to extracting as little data from the file as possible by only retrieving the locations of all sections in the file – I hope I’ll be able to actually parse the sections when needed. Essentially I’ll try to execute from flash as much as possbible and only keep some lookup table in RAM. This will no doubt hurt performance in the long run, however I’m just in this to see if I can actually pull this off, so there’s no need for stellar performance.
Getting a bare-bones version to run
After much cursing I finally got to a point where I have a very crude version of a wasm binary parser, that is able to resolve the correct codesection for a given export. Horray. For now I’ve stuck to using fixed size buffers for all elements (which is obviously a bad idea), however, I might actually get around that as:
- I do not really need to keep the tables in RAM, a lot of the interesting tables have fixed entry sizes, which allows direct access if I know the index, meaning I can do this from flash and only need the start address of the respective table.
- For performance reasons I will probably keep the indices of the codesegment in memory, so I can easily lookup functions (see last section).
The biggie, still looming: I might not get around the need for an allocator. I actually might end up writing my own allocator to only manage the memory assigned to the runtime by the hostprogram, as I don’t want to set a global allocator. This will also effectively compartmentalize the memory of a given module as well.
Also: Since I’ve only been toying (and will be for some time to come), I guess I will have a major overhaul at some point, in order to acutally get the thing to run on the M4. Until now I’ve been toying in an environment which is close to the M4 but still runs my desktop.
What’s next?
As mentioned I’ve got something that somewhat resembles a binary parser and also have implemented a handful of opcodes. For now this shall suffice and I’ll work towards deploying the thing towards the M4.