FreeRTOS vs Zephyr for Industrial Sensors — A 200-Day Reliability Comparison

Field telemetry across two identical sensor fleets running different RTOSes: watchdog resets, memory fragmentation events, and the developer-time cost of each.


Two Identical Fleets, Two Different Kernels

The honest comparison between FreeRTOS and Zephyr is hard to find in marketing material. Both projects publish benchmark numbers; neither publishes how their kernel actually behaves after 4,000 hours in a panel-mounted enclosure on a factory floor. Two years ago we got an unusual opportunity: a customer building an industrial vibration sensor decided to dual-source the firmware. Same Cortex-M4 hardware, same sensor stack, same MQTT backend — one fleet running FreeRTOS 10.4, the other running Zephyr 3.2. We've watched both fleets in production for 200 days.

The hardware is identical: STM32L4 @ 80 MHz, 256 KB flash, 128 KB SRAM, ICM-42688 accelerometer, BG770A cellular modem, 7800 mAh battery, IP67 enclosure. Both firmware stacks ship the same telemetry: heap headroom, watchdog reset counter, hard-fault counter, modem reconnect count, and battery state. Both report every 15 minutes to a central time-series database. Fleet size: 412 FreeRTOS units, 387 Zephyr units, all deployed to the same three customer sites.

  • Mean uptime — FreeRTOS: 47.3 days; Zephyr: 51.8 days.
  • Hard faults per 1,000 unit-days — FreeRTOS: 2.4; Zephyr: 0.9.
  • Heap-exhaustion resets — FreeRTOS: 18 events; Zephyr: 0 events.
  • Mean developer days per feature ticket — FreeRTOS: 1.8; Zephyr: 3.1.

"Zephyr won every reliability metric, but the team felt the pain of every PR. FreeRTOS was the opposite — easy to merge, harder to keep alive. That's the real trade-off, not the kernel benchmark." — Pioneer Horizon embedded engineer

Memory Model and Fragmentation in the Field

The single most decisive difference in the field was memory behaviour. Both fleets ship with a heap; both fleets do dynamic allocation during MQTT publish (for the payload buffer); both fleets eventually fragment. They fragment very differently.

FreeRTOS — heap_4 and the long-tail fragmentation problem

We used heap_4 (best-fit with coalescence) — the default reasonable choice for a system that does mixed-size allocations. After 30 days of uptime, the largest contiguous free block on a typical FreeRTOS unit had dropped from 42 KB to 8 KB, even though total free memory was still 60 KB. On day 47 (mean), the unit failed to allocate a 9 KB TLS handshake buffer and watchdog-reset. The reset reclaims memory; the cycle repeats.

We tried heap_5 (multi-region) and a periodic forced-reset, but the fundamental problem — the kernel doesn't give us a way to compact — remained. Static allocation everywhere would have solved it, but we'd already shipped firmware that relied on the dynamic path.

Zephyr — memory slabs and the discipline they force

Zephyr's memory model nudges you toward fixed-size pools (k_mem_slab) and away from a single shared heap. The MQTT publish path uses a slab of 8 fixed 4 KB buffers; the TLS session uses a separate slab. Fragmentation cannot happen between pools because they live in different regions, and within a pool every allocation is the same size by definition. After 200 days, no Zephyr unit has hit an allocation failure.

The cost is up-front: every allocation site has to declare its pool, size it, and pick the right slab. This is the friction that drove the 3.1-day mean per ticket — but it's also why the fleet is stable.

What we'd do differently

For new programmes today, we use Zephyr where reliability matters more than time-to-market, and FreeRTOS with aggressively static allocation where the team is small. The middle path — FreeRTOS with dynamic everywhere — is what we now actively discourage.

Drivers, HAL, and the Developer-Time Cost

FreeRTOS doesn't ship drivers. You bring STM32CubeHAL (or LL drivers), or you write your own. Zephyr ships a full device-driver model with a devicetree-based binding layer. This is the single biggest difference in day-to-day developer experience, and it cuts both ways.

The FreeRTOS path

  • Bring CubeHAL — generated code, broad device coverage, well-documented, but each driver has its own conventions. The SPI driver feels nothing like the I2C driver.
  • Or bring LL — leaner, closer to the register layer, faster, but you write more code per peripheral.
  • RTOS integration (mutexes around shared peripherals, ISR-to-task notifications) is your responsibility on every driver.

The good news: when something goes wrong, you can read the entire stack in an afternoon. There is no opaque layer between you and the registers.

The Zephyr path

  • Drivers are declared in devicetree. You pick a sensor (ICM-42688), enable the driver in prj.conf, and use a uniform sensor API to read it.
  • The same API works for accelerometers, magnetometers, environmental sensors — across families and vendors.
  • Mutex acquisition, ISR plumbing, and power management are handled inside the driver framework.

The bad news: when something goes wrong, you're chasing through three layers of generic code. Debugging an I2C bus hang in Zephyr took us 11 hours; the equivalent on FreeRTOS would have been 2. Our bus-hang recovery article shows the exact recovery path we eventually ship — it's harder to slot into Zephyr's generic driver layer than into a hand-written FreeRTOS driver.

The ticket-time number

The 1.8-day vs 3.1-day mean per ticket sounds bad for Zephyr, but it's misleading. Zephyr's first month is brutal; after that, the next sensor takes a day instead of a week, because the devicetree binding and the sensor API are already understood. The FreeRTOS team spent less time per ticket but did more individual fights with custom drivers across the project's life.

Tickless Idle and Battery Life

Both kernels support tickless idle, but the implementations and the practical battery numbers diverged. Our customer ships these sensors on 7,800 mAh primary batteries with a 5-year design life — every microamp matters.

FreeRTOS tickless idle

The default tick is 1 kHz. Tickless idle lets the kernel sleep until the next scheduled tick or interrupt, but the integration is hand-rolled per port. We had to write portSUPPRESS_TICKS_AND_SLEEP against the LPTIM peripheral, including the entry/exit math for waking on time. We got it working; the mean current in idle dropped from 1.4 mA to 18 µA.

Zephyr tickless idle

Tickless is the default. CONFIG_PM_DEVICE and CONFIG_PM wire the power management framework to suspend devices alongside the CPU. The same hardware idled at 14 µA — a 22% improvement, achieved by enabling two Kconfig options rather than writing a port file.

Over 200 days that's roughly:

  • FreeRTOS fleet — projected battery life 4.6 years.
  • Zephyr fleet — projected battery life 5.4 years.

The Zephyr fleet meets the 5-year design target; the FreeRTOS fleet doesn't, unless we go back and add more aggressive duty-cycling at the application layer. That's another developer-week we hadn't budgeted.

The footnote — Zephyr's footprint cost

Zephyr's flash footprint on this device is 178 KB; FreeRTOS plus the equivalent application code is 92 KB. On a 256 KB part we have headroom; on a 128 KB part we would not. If your BOM forces you to a smaller MCU, FreeRTOS still has the upper hand.

How We Choose Between Them Today

After 200 days of field data and two more programmes that came after, here's the decision framework we apply on a new embedded engagement. None of these are absolutes — they're starting points for the conversation.

Default to Zephyr when…

  1. Field reliability is more valuable than time-to-market — long-life industrial, medical telemetry, infrastructure monitoring.
  2. The product is one of many with shared firmware platform — Zephyr's devicetree pays back across products, not within one.
  3. Battery life is a binding constraint — the tickless and PM frameworks are simply ahead.
  4. You have the flash budget (target part > 256 KB flash, > 64 KB RAM).
  5. The team has at least one engineer with Linux/devicetree experience already.

Default to FreeRTOS when…

  1. Time-to-first-prototype matters more than time-to-fleet-stability.
  2. The MCU is < 256 KB flash or you're on a non-Zephyr-supported part.
  3. The team is small (1–2 firmware engineers) and the product is one-off, not platform.
  4. You already have a working FreeRTOS codebase to evolve — switching kernels mid-product is rarely worth it.

Avoid the middle path

The pattern that bit us — FreeRTOS with unconstrained dynamic allocation — is the worst of both worlds. If you stick with FreeRTOS, commit to static allocation everywhere and treat any pvPortMalloc in non-init code as a code-review red flag. If you commit to dynamic, move to Zephyr's memory-slab model.

For the validation harness we use on both kernels — host-runnable unit tests that catch most of these issues before fleet deployment — see our embedded-C unit testing article.

Chat on WhatsApp