This article is a guest post written by our interns at RIoT Secure, as part of a multi-part series exploring the evolution of WebAssembly (WASM) and the WebAssembly System Interface (WASI). Each post in this series takes a closer look at the technology’s journey, challenges, and opportunities - with a special focus on what these developments mean for IoT and secure edge computing.
The Utopian Dream vs. Reality: WASM Binary Size in High-Level Languages
In our previous post, we explored the long road to a stable WASI 1.0 standard. With compilers available for C, Rust, Go, Python, and JavaScript, WebAssembly suggests effortless portability. One of the core promises behind WASI is the ability to "write in your favorite language, compile to WASM, and run anywhere.” But in IoT, where devices have limited flash and RAM, the dream doesn’t always fit reality. For cloud and server developers, this vision is powerful - resources are abundant, and a few extra megabytes rarely matter. But in the world of IoT and embedded systems, every kilobyte counts. The dream of effortless portability quickly collides with hardware reality.
The 64Kb Memory Constraint
One challenge comes directly from the WASM specification itself: memory in WebAssembly is divided into fixed-size pages of 64 KB. A module’s memory is always a multiple of 64 KB, meaning the smallest possible allocation is one full page. This is fine on desktops, but many embedded devices have less than 64 KB of total memory available. Developers still want to run WASM on these devices (Fitzgerald 2024 [1]), but doing so often requires designing modules that have no dependency on memory, or non-standard adjustments to reduce the page size - breaking some of the portability and tooling assumptions that make WASM attractive in the first place.
The Reality: Binary Bloat
To make matters worse, the binaries themselves are often far larger than IoT devices can handle. Our interns compiled simple programs - a "Hello World” and a Fibonacci sequence - across multiple toolchains. "Hello World" shows the cost of console output and runtime startup; Fibonacci isolates pure computation. The results highlight the growing gap between the portability promise and the practical feasibility for IoT.
Hello World
Every programming journey begins with a "Hello World.” It’s the simplest possible program: a single line of output, very minimal dependencies, no complexity. That’s exactly why it makes a perfect benchmark for WebAssembly.
In this way, Hello World becomes more than a trivial example. It serves as a baseline measurement of how much overhead a toolchain introduces. If the smallest possible program is already heavy, then real-world applications with actual logic, data structures, and dependencies will inevitably grow much larger. For IoT developers, where memory is measured in kilobytes rather than gigabytes, that difference can quickly mean the boundary between something that runs and something that simply will not fit.
To see this effect in practice, we compiled the same "Hello World” program across several languages and toolchains:
- source: "C"
- source: "Go"
- source: "Rust"
- source: "JavaScript"
- source: "Python"
// hello_world.c #include <stdio.h> int main() { printf("Hello World\n"); return 0; }
// build command (WASI-SDK) $ wasi-sdk/bin/clang --version clang version 20.1.8-wasi-sdk $ wasi-sdk/bin/clang \ --sysroot=wasi-sdk/share/wasi-sysroot \ hello_world.c -o hello_world.wasm // build command (Emscripten) $ emcc --version emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.50 $ emcc \ -sSTANDALONE_WASM=1 \ hello_world.c -o hello_world.wasm
// execute $ wasmtime run hello_world.wasm Hello World
To reveal the associated source code and build commands, simply click on the list points above.
|
|
Language / Toolchain | WASM Binary Size |
---|---|
|
|
Python (py2wasm) | 25,471,125 bytes |
Go (standard toolchain) | 2,312,600 bytes |
JavaScript (javy) | 980,966 bytes |
TinyGo | 110,084 bytes |
C (WASI SDK) | 93,250 bytes |
Rust (wasm32-wasip2 target) | 86,254 bytes |
Rust (wasm32-wasip1 target) | 65,103 bytes |
C (Emscripten) | 11,936 bytes |
|
If "Hello World" already produces a large binary, it tells us something important: the size is not coming from the program itself, but from the runtime baggage that the toolchain brings along. In C, this often includes the standard C library and startup code that sets up basic console output. Rust and Go add even more, pulling in memory allocators, threading models, and error-handling routines before main() is ever called. Higher-level languages such as Python and JavaScript go further still, embedding entire interpreters or virtual machines inside the binary.
Fibonacci Sequence
While "Hello World" is useful for measuring the cost of system-level features like console output, it still depends on system level console output. To isolate the impact of language and runtime overhead alone, we turned to another classic benchmark: the Fibonacci sequence.
- source: "C"
- source: "Go"
- source: "Rust"
- source: "JavaScript"
- source: "Python"
// fibonacci.c #include <stdio.h> uint32_t fib(uint32_t n) { uint32_t a = 0, b = 1; for (uint32_t i = 0; i < n; ++i) { uint32_t t = a + b; a = b; b = t; } return a; } int main() { return fib(10); }
// build command (WASI-SDK) $ wasi-sdk/bin/clang --version clang version 20.1.8-wasi-sdk $ wasi-sdk/bin/clang \ --sysroot=wasi-sdk/share/wasi-sysroot \ fibonacci.c -o fibonacci.wasm // build command (Emscripten) $ emcc --version emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.50 $ emcc \ -sSTANDALONE_WASM=1 \ fibonacci.c -o fibonacci.wasm
// execute $ wasmtime run fibonacci.wasm $ echo $? 55
To reveal the associated source code and build commands, simply click on the list points above.
|
|
Language / Toolchain | WASM Binary Size |
---|---|
|
|
Python (py2wasm) | 25,479,286 bytes |
Go (standard toolchain) | 1,904,881 bytes |
JavaScript (javy) | 982,575 bytes |
Rust (wasm32-wasip2 target) | 69,439 bytes |
Rust (wasm32-wasip1 target) | 49,117 bytes |
TinyGo | 21,565 bytes |
C (WASI SDK) | 20,765 bytes |
C (Emscripten) | 821 bytes |
|
This program simply computes a number using arithmetic and looping, with no interaction with the operating system, no calls to stdout, and no external libraries beyond the most basic integer operations. In other words, it is a "pure” algorithm that should, in theory, compile down to a minimal sequence of instructions. By focusing on Fibonacci, we can see how each toolchain handles code generation when system-level dependencies are removed. If the resulting binary is still large, it is a clear sign that the overhead comes not from the algorithm itself, but from the scaffolding the toolchain adds around it.
Why So Large?
When you compile high-level languages to WASM, the jump in binary size isn’t random - it comes from hidden layers of overhead. Our Hello World and Fibonacci tests prove it clearly: C, Rust and TinyGo produce compact binaries, while Go, Python, and JavaScript result in binaries in the megabytes. The gap is explained by four main sources of weight that are baked into the toolchains themselves.
- Language Runtimes
- Standard Libraries
- Abstraction Layers
- Compiler Defaults and Safety Nets
Languages like Go, Python and JavaScript bring along their own runtime baggage: memory allocators, garbage collectors, panic handlers, and runtime metadata. These features are designed to make software development safe and productive on desktops and servers. In a microcontroller environment these can result in a binary being way to large for the constrained program memory.
C is often held up as “close to the metal,” but even C toolchains bring hidden overhead. Standard libraries (libc) and startup code (crt0) are provided by default> to ensure compatibility with features like filesystems, locales, and POSIX compliance. On a desktop system, that’s great. On a sensor node that has no filesystem and only transmits bytes over a serial UART, most of that code will never be used. It is however, possible to disable, with some effort - but then it reduces the portability of the code.
WASI was designed to make WASM portable by abstracting away system calls, but this comes at a cost that increase size by adding glue code and shims. Some high-level languages tend to bundle in features “just in case” - threading, async event loops, signal handling - even if the final program doesn’t use them. On a microcontroller with no operating system, this scaffolding is wasted flash space. The result is an illusion of portability that’s hard to realize when resources are constrained.
Finally, the compilers themselves add weight. By default, they include debug metadata, stack traces, and panic handlers. These are useful for developers during debugging, but they serve no purpose in production firmware. Unless the developer explicitly enables aggressive compiler options or runs post-processing tools, the binary will contain large sections of unneccesary data and dead code. The difference between an unoptimized and optimized WASM binary can be hundreds of kilobytes - sometimes the margin between success and failure on an embedded device.
Taken together, these four categories explain why our simple tests produced such different results. The programs themselves were designed to be identical, but the hidden overhead of runtimes, libraries, abstractions, and compiler defaults completely resulted in very different WASM file sizes. For IoT developers, this isn’t just an academic detail - it’s the difference between a firmware image that fits and one that doesn’t.
Let’s walk through the toolchains one by one - from the leanest options to those that carry the most baggage.
C (Emscripten)
https://github.com/emscripten-core/emscripten
What Happens: Emscripten uses Clang to compile your C/C++ to LLVM IR and then links it into a WebAssembly core module. When you pass -sSTANDALONE_WASM=1, it skips the usual browser/Node JS glue and emits a freestanding .wasm with a _start that runs global constructors and then calls your main (if present). In standalone mode Emscripten arranges imports/exports so the module can run in generic WASM (wasm32-wasi-p1) runtimes.
What It Brings In: In the minimal “pure compute” case (no printf, no malloc, no filesystem), the linker can dead-strip almost everything: no JS shims, no browser runtime, and only the tiny startup needed to reach your code. As soon as you use libc features, Emscripten links in its libc (musl-based) pieces selectively - printf drags formatted I/O, malloc pulls an allocator.
Impact: For small, compute-only programs, Emscripten can be the absolute smallest option because it contributes essentially nothing beyond the call into your function. The trade-off is that as soon as you rely on richer facilities (stdio formatting, FS, threads, embind, etc.) size increases quickly because Emscripten must supply those layers inside the module.
C (WASI SDK)
https://github.com/WebAssembly/wasi-sdk
What Happens: Clang targets wasm32-wasi and links against wasi-libc to produce a WebAssembly core module that starts at _start; the C runtime runs constructors and then invokes your main(argc, argv). The compiler and linker arranges imports/exports so the module can run in generic WASM (wasm32-wasi-p1) runtimes.
What It Brings In: Only the wasi-libc pieces you actually use (e.g., printf brings formatted input and output; malloc brings an allocator). If you want to be ultra-minimal you can bypass libc entirely (-ffreestanding -nostdlib -Wl,--no-entry) and call WASI imports directly, but the default keeps things simple and portable.
Impact: Lean, predictable binaries that stay small while still giving you “real” WASI input and output. You pay only for the libc surface that you exercise; pure compute stays tiny, and even typical main + printf style programs remain compact compared to higher-level stacks.
TinyGo
https://github.com/tinygo-org/tinygo
What Happens: TinyGo compiles Go to WASI with an aggressively trimmed toolchain and whole-program dead-code elimination. It emits a core module and arranges imports/exports that runs within a generic WASM (wasm32-wasi-p1) runtime, with configurable panic strategy and (optional) lightweight scheduling.
What It Brings In: A compact Go runtime (small allocator, minimal scheduler when enabled, reduced reflection) and only the stdlib pieces that you reference and utilize. There’s no full garbage collection Go runtime dragged in unless required features force it.
Impact: Near-C sizes for simple programs while staying in Go. As you add reflection, interfaces, and more of the standard library the binary grows - but it typically remains far smaller than outputs from the standard Go toolchain.
Rust (wasm32-wasi-p1 and wasm32-wasi-p2)
https://github.com/bytecodealliance/cargo-wasi
https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip1.html
https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip2.html
What Happens: rustc can target WASI Preview 1 to produce a WebAssembly core module where std maps to WASI (argv/env, stdout/err, proc_exit, etc.), and WASI Preview 2, the build produces a component instead of a raw core module: a small wrapper advertises interfaces (the wasi:cli world for command line interface apps) so the artifact composes cleanly across languages and hosts.
What It Brings In: Both options include Rust’s core/alloc and only the parts of std you use, plus panic/formatting machinery (all tunable with panic=abort, LTO, -Oz). An allocator is linked only if you actually allocate. Preview 2 adds a thin component envelope (metadata/adapters) to the same compiled code.
Impact: Preview 1 tends to yield the smallest Rust binaries and behaves like a conventional command line interface (including numeric exit codes). Preview 2 costs a little extra for the component wrapper but delivers first-class interop via the Component Model; it’s the better choice when you need language-agnostic interfaces and composition.
Go (standard toolchain)
https://go.dev/wiki/WebAssembly
https://go.dev/blog/wasi
What Happens: With the standard toolchain you build a core module using GOOS=wasip1 GOARCH=wasm that runs like a command line interface program; Go’s usual startup runs and then calls main(), and the linker arranges imports/exports so the module can run in generic WASM (wasm32-wasi-p1) runtimes.
What It Brings In: The full Go runtime (GC, goroutine scheduler, stack management, allocator, introspection). -ldflags "-s -w" and -trimpath trim symbols and paths, but the runtime bulk remains.
Impact: Very smooth developer experience with familiar Go semantics, but binaries are typically in the megabytes. Solid for environments where size is a secondary concern; otherwise prefer TinyGo for space-sensitive targets.
Javascript (javy)
https://github.com/bytecodealliance/javy
What Happens: javy packages your JavaScript code together with the QuickJS engine and emits a core module with imports/exports so the module can run in generic WASM (wasm32-wasi-p1) runtimes. The script becomes the program's entry point.
What It Brings In: A full JavaScript VM (QuickJS) plus minimal scaffolding to drive it via WASI. There’s no Node API surface, so anything beyond console/args requires either WASI or Javy-specific providers.
Impact: Extremely convenient and portable, but every artifact carries a JavaScript interpreter, so even trivial programs land near the megabyte range. Great for flexibility and demos; not ideal for tight flash footprints.
Python (py2wasm)
https://github.com/wasmerio/py2wasm
What Happens: py2wasm (via Nuitka) compiles your script and bundles the CPython runtime and required stdlib into a WASI core module. The result is a fully self-contained Python environment in a single .wasm file with imports/exports so the module can run in generic WASM (wasm32-wasi-p1) runtimes.
What It Brings In: Essentially a full Python distribution: interpreter, object model, garbage collection, parser, and any core modules your program needs. That payload dominates the final size; the actual code is a tiny fraction.
Impact: Impressive portability and easy packaging, but outputs are large (often tens of megabytes). Great for demos or controlled environments; a poor fit for microcontrollers or tight flash budgets.
Why Not Just Write WAT (WebAssembly Text)
If your top priority is the smallest possible binary, writing WAT (WebAssembly Text) directly is the endgame. You author the exact instructions, imports, and exports - no compiler, no libc, no startup code - so there’s zero toolchain overhead and every byte in the module is under your control. The payoff is ultra-tiny artifacts that nothing else can match. The trade-off: it’s manual, easy to get wrong, and harder to maintain - best used when byte budgets are strict and the logic is simple.
Let's look at Hello World and the Fibonacci Sequence if writing in WebAssembly Text Format:
Hello World
- source: "WebAssembly Text"
;; hello_world.wat (module (import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32))) (memory (export "memory") 1) (data (i32.const 8) "Hello World\n") (func $main (result i32) ;; set up iovec[0] (i32.store (i32.const 0) (i32.const 8)) ;; buf = &"Hello World\n" (i32.store (i32.const 4) (i32.const 12)) ;; len = 12 byte ;; call fd_write(1, &iovec, 1, &nwritten) (call $fd_write (i32.const 1) ;; fd 1 = stdout (i32.const 0) ;; pointer to iovec array (offset 0) (i32.const 1) ;; number of iovecs (i32.const 20)) ;; where to store bytes written drop ;; we don't need the result ;; return 0 (i32.const 0) ) (func (export "_start") call $main drop ) )
// build command $ wat2wasm hello_world.wat
// execute $ wasmtime run hello_world.wasm Hello World
This compiles into a WASM binary of 153 bytes.
Fibonacci Sequence
- source: "WebAssembly Text"
;; fibonacci.wat (module (import "wasi_snapshot_preview1" "proc_exit" (func $proc_exit (param i32))) (func $fib (param $n i32) (result i32) ;; local declarations (local $a i32) (local $b i32) (local $i i32) (local $t i32) ;; initialize locals (local.set $a (i32.const 0)) ;; a = 0 (local.set $b (i32.const 1)) ;; b = 1 (local.set $i (i32.const 0)) ;; i = 0 ;; loop block (block $break (loop $loop ;; loop break condition (br_if $break (i32.ge_u (local.get $i) (local.get $n))) (local.set $t (i32.add (local.get $a) (local.get $b))) ;; t = a + b (local.set $a (local.get $b)) ;; a = b (local.set $b (local.get $t)) ;; b = t ;; increment loop counter (local.set $i (i32.add (local.get $i) (i32.const 1))) (br $loop) ) ) ;; return a (local.get $a) ) (func (export "main") (result i32) (call $fib (i32.const 10)) ) (func (export "_start") (call $proc_exit (call $main)) ) )
// build command $ wat2wasm fibonacci.wat
// execute $ wasmtime run fibonacci.wasm $ echo $? 55
This compiles into a WASM binary of 156 bytes.
These demonstrates the absolute lower bound of what a WebAssembly module can be - a raw sequence of instructions with no scaffolding or bringing in unnecessary support functions and language specific runtimes. From an IoT perspective, it’s a proof of possibility: binaries this small can absolutely fit into even the most constrained devices.
The tradeoff, of course, is ergonomics. Writing WAT means giving up everything developers rely on in modern languages: type safety, libraries, tooling, even basic syntax conveniences. No one wants to write real firmware by hand in WAT. The challenge is how to achieve binaries this efficient while still working in higher-level languages that make developers productive. That gap is exactly where IoT-focused runtimes and toolchains must innovate.
WAT doesn’t scale - developers would have to give up high-level features like memory safety, type systems, package managers, and debugging tools. Maintaining even a moderately complex project in WAT would be like writing an entire operating system in raw assembly: technically possible, but practically unsustainable. Modern IoT developers expect the ergonomics of languages like Rust, C, Go, JavaScript or Python - they just can’t afford to carry along the megabytes of runtime baggage those toolchains often introduce.
This is the crux of the challenge: binary size grows because developers want modern languages, but microcontrollers can’t afford their overhead. The solution is not to abandon high-level languages, but to create lean runtimes and smarter compilers that strip away everything IoT devices don’t need, while keeping the developer experience intact.
Compiler and WASM Optimizations
Even though runtimes and libraries are the main culprits behind binary bloat, developers are not completely powerless. Each environment may expose a set of compiler options and post-processing tools exist that can trim WebAssembly modules and in some cases cut their size substantially.
The most common toolkit is Binaryen (https://github.com/WebAssembly/binaryen), which provides a collection of optimization tools. Its flagship tool, wasm-opt, can aggressively inline functions, remove dead code, fold constants, and reorder instructions for efficiency. A simple wasm-opt -Oz processing can dramatically shrink Rust binaries, sometimes turning a 400 KB module into something closer to 150 KB.
Another useful utility is wasm-strip, which removes names, debug information, and metadata that are useful during development and debugging but unnecessary in production. The result is a leaner binary that loads faster and takes up less flash. Other tools like wasm-gc can also garbage-collect unused functions and imports, squeezing out even more savings.
Optimizers (especially -O2, -Os, -Oz) can sometimes assume that anything with pure math utilizing constant inputs are safe to pre-compute. After heavy optimization, a function call may be simply replaced to a pre-computed constant, removing the actual logic. It is important to always inspect the result of optimization to validate that algorithms that need to be dynamic in nature haven't been optimized out, unless that is your intention.
Lets look at how standard WASM tools reduce the size of the WASM binaries using the following options:
wasm-strip code.wasm -o code-strip.wasm wasm-opt -Oz --strip-debug \ --enable-bulk-memory --enable-nontrapping-float-to-int \ code.wasm -o code-opt.wasm wasm-opt -Oz --strip-debug \ --enable-bulk-memory --enable-nontrapping-float-to-int \ code-strip.wasm -o code-combo.wasmHello World
Language / Toolchain Unoptimized wasp-strip wasp-opt wasp-opt ← wasm-strip
Python (py2wasm) 25,471,125 bytes 11,971,981 bytes 10,828,436 bytes 10,828,301 bytes Go (standard toolchain) 2,312,600 bytes 2,312,368 bytes 2,233,446 bytes 2,233,255 bytes JavaScript (javy) 980,966 bytes 980,779 bytes 969,258 bytes 969,050 bytes Rust (wasm32-wasip2 target) 86,254 bytes N/A * N/A * N/A * TinyGo 110,084 bytes 108,425 bytes 108,230 bytes 107,961 bytes Rust (wasm32-wasip1 target) 65,103 bytes 52,058 bytes 43,350 bytes 43,038 bytes C (WASI SDK) 93,250 bytes 16,503 bytes 14,322 bytes 14,052 bytes C (Emscripten) 11,936 bytes 11,936 bytes 10,566 bytes 10,566 bytes Hand-written WAT 153 bytes 153 bytes 143 bytes 143 bytes
* wasm-opt and wasm-strip only operate on WASM core modules (wasm32-wasi-p1), not components (wasm32-wasi-p2)Fibonacci Sequence
Language / Toolchain Unoptimized wasp-strip wasp-opt wasp-opt ← wasm-strip
Python (py2wasm) 25,479,286 bytes 11,979,962 bytes 10,835,290 bytes 10,835,155 bytes Go (standard toolchain) 1,904,881 bytes 1,904,659 bytes 1,845,964 bytes 1,845,773 bytes JavaScript (javy) 982,575 bytes 982,295 bytes 970,867 bytes 970,566 bytes Rust (wasm32-wasip2 target) 69,439 bytes N/A * N/A * N/A * Rust (wasm32-wasip1 target) 49,117 bytes 38,589 bytes 31,310 bytes 30,998 bytes TinyGo 21,565 bytes 20,561 bytes 20,502 bytes 20,233 bytes C (WASI SDK) 20,765 bytes 522 bytes 539 bytes 269 bytes C (Emscripten) 821 bytes 821 bytes 577 bytes 577 bytes Hand-written WAT 156 bytes 156 bytes 123 bytes 123 bytes
* wasm-opt and wasm-strip only operate on WASM core modules (wasm32-wasi-p1), not components (wasm32-wasi-p2)These optimization tools, however, have limits. They act like a diet plan for already bloated binaries, cutting fat but never changing the underlying structure. If the toolchain itself drags in a runtime or interpreter, no amount of stripping can make a multi-megabyte Python or JavaScript binary fit into a 128 KB microcontroller. Even after heavy optimization, many “Hello World” programs from high-level languages remain too large for devices with less than 256 KB of flash.
In other words, optimization tools are valuable - they can polish and trim (even handwritten WAT code) - but for resource constrained environments they are a band-aid, not a cure. The real challenge is upstream: choosing toolchains and runtimes that avoid unnecessary overhead from the start. Even after optimization, the largest binaries stay in megabytes - still too big for most microcontrollers.
RIoT Secure Perspective
At RIoT Secure, our approach begins with an IoT-first mindset. Instead of starting from server environments and trying to cut down, we design runtimes that are lightweight from the start, avoiding unnecessary baggage entirely. Compatibility with WASI remains important, but we pursue it through intelligent mapping - ensuring support for Previews 1, 2, and 3 where it makes sense, while discarding abstractions that add size but no value in embedded contexts.
Equally important is selective inclusion. Most IoT devices do not need a full filesystem or multi-threading API, but they do need stable networking, cryptography, and secure update mechanisms. By pulling in only what IoT workloads truly require, we keep binaries small while still delivering the benefits of portability. Finally, we recognize that IoT is not just about getting code to run - it’s about running it securely for years. That means lifecycle management, secure updates, and minimizing the attack surface. Large binaries aren’t just a problem for flash space, they are also harder to secure and audit.
For IoT developers, the message is clear: the promise of WASM portability is compelling, but not at the cost of bloated binaries that won’t fit. At RIoT Secure, our work focuses on bridging this gap - making WASM practical for IoT today, while staying aligned with the evolving WASI standard so that developers don’t have to choose between efficiency and portability.
References
- [1] Fitzgerald, N. (2024, December 17). Making WebAssembly and Wasmtime More Portable. Bytecode Alliance.
This is the second of our multi-part series on WASM and WASI. In our next post - "Zero-Bloat Starter Kits for BRAWL (WASM) - Demos in WAT, C, Rust and TinyGo" - we'll demonstrate running a WASM demo utilizing the LED Matrix of the Arduino UNO R4 WiFi device - with a minimal WASI host API, no libc bloat, no runtimes.
#lifecycle #management #security #iotsecurity #internetofthings #riotsecure #microcontrollers #arduino #webassembly #wasm #wasi