TECHNICAL BLOG
Deep Dives for Engineers
Detailed technical articles covering the real problems we solve in embedded systems, AI, and robotics engineering.
Detailed technical articles covering the real problems we solve in embedded systems, AI, and robotics engineering.
How to configure, trim, and optimise the Linux kernel for embedded targets — from Kconfig surgery to real-time scheduling and memory footprint reduction.
Never begin kernel configuration from allyesconfig or a desktop distribution's config. Start from your board's vendor defconfig, or from tinyconfig and build upward. The kernel ships defconfigs for hundreds of boards under arch/arm64/configs/ and arch/arm/configs/. Use make ARCH=arm64 defconfig for a minimal ARM64 baseline, then run make menuconfig to add what your application actually needs.
The fastest way to find bloat is to boot a default kernel, run lsmod, and disable every module that is not loaded. Then audit /proc/filesystems, /proc/interrupts, and /proc/devices to understand which subsystems your workload actually touches. Common removals for headless embedded targets:
CONFIG_STAGING)For applications with hard timing requirements — motor control, sensor fusion, protocol handling with strict deadlines — the mainline kernel's default scheduler introduces latency that is unacceptable. The PREEMPT_RT patch set (now being merged upstream in pieces) converts nearly all kernel spinlocks to sleeping mutexes and makes interrupt handlers preemptible, reducing worst-case latency from hundreds of microseconds to tens.
# Enable in Kconfig
CONFIG_PREEMPT_RT=y
CONFIG_HZ_1000=y
CONFIG_NO_HZ_FULL=y
Measure before and after with cyclictest:
cyclictest --mlockall --smp --priority=80 --interval=200 --distance=0 -l 100000
On a Raspberry Pi 4 with PREEMPT_RT, worst-case latency typically drops from ~1.5 ms to under 150 µs under load.
RAM is expensive on constrained targets. Enable CONFIG_CC_OPTIMIZE_FOR_SIZE to compile the kernel with -Os instead of -O2. Use CONFIG_SLOB or CONFIG_SLUB (SLUB is recommended) as the SLAB allocator — SLOB is smallest but lacks debugging support. For very tight targets consider CONFIG_BASE_SMALL=y.
Check your actual kernel size with:
size vmlinux
ls -lh arch/arm64/boot/Image
Kernel command-line parameters that meaningfully reduce boot time on embedded targets:
quiet loglevel=0 — suppress console output (can save 50-100 ms on slow UARTs)rootfstype=ext4 — avoids filesystem probingfsck.mode=skip — skip fsck on read-only rootfsinitcall_debug — use during profiling to identify slow init calls, then removeCombine with systemd-analyze blame (or the equivalent for your init system) to see where time goes after the kernel hands off to userspace.
For products with optional hardware peripherals, use devicetree overlays rather than building separate kernel images per SKU. Overlays are small DTB fragments loaded by the bootloader that patch the base device tree at runtime. U-Boot's fdtoverlay command applies them before booting:
load mmc 0:1 ${fdt_addr_r} base.dtb
load mmc 0:1 0x02000000 overlay-uart2.dtbo
fdt addr ${fdt_addr_r}
fdt resize 65536
fdt apply 0x02000000
Kernel tuning is an iterative discipline. Profile first, then remove or optimise specific subsystems. A well-tuned embedded kernel should boot in under two seconds from power-on to first userspace process, consume less than 12 MB of RAM at idle, and — if real-time is required — deliver deterministic scheduling latency. These goals are achievable on commodity hardware with the right configuration discipline.
Continue reading — handpicked articles you might enjoy