Exploring the Inner Workings of WebAssembly: A Deep Dive into the Binary Format

Exploring the Inner Workings of WebAssembly: A Deep Dive into the Binary Format

ยท

8 min read

Introduction ๐Ÿ’ก

WebAssembly (WASM) is a binary instruction format that is designed to be a portable and efficient compilation target for a wide range of programming languages. Unlike JavaScript, which is an interpreted language, WebAssembly is a low-level language that is designed to be executed by a virtual machine. This makes it possible to execute code written in other languages, such as C and C++, in a web browser.

In this blog post, we will explore the WebAssembly binary format in detail. We will discuss the structure of a WebAssembly module, the types of instructions that can be encoded in WebAssembly, and the encoding of those instructions in the binary format. We will also look at examples of WebAssembly code and examine how they are encoded in the binary format.

Structure of a WebAssembly module

A WebAssembly module is composed of several sections, each of which serves a different purpose. The following sections are defined in the WebAssembly specification:

  1. Type section

  2. Import section

  3. Function section

  4. Table section

  5. Memory section

  6. Global section

  7. Export section

  8. Start section

  9. Element section

  10. Code section

  11. Data section

  12. Data count Section

Let's examine each of these sections in detail.

โœ”๏ธ Type section

The type section defines the function signature of each function in the module. The function signature consists of the function's parameter types and return type. The type section is encoded as follows:

typesec ::= xโ€™01โ€™ vec(func_type)
func_type ::= xโ€™60โ€™ typeidx*

The typesec opcode is encoded as 0x01. The func_type opcode is encoded as 0x60. The typeidx operand is an index into the function signature table. The typeidx operand is encoded using the variable-length encoding scheme described in the next section.

โœ”๏ธ Import section

The import section defines the functions, tables, memories, and globals that are imported into the module from other modules. The import section is encoded as follows:

importsec ::= xโ€™02โ€™ vec(import)
import ::= modname name importdesc
modname ::= name
name ::= string
importdesc ::= xโ€™00โ€™ typeidx // function import
| xโ€™01โ€™ tabletype // table import
| xโ€™02โ€™ memtype // memory import
| xโ€™03โ€™ globaltype // global import

The importsec opcode is encoded as 0x02. The modname operand is the name of the module from which the function is imported. The name operand is the name of the function, table, memory, or global being imported. The importdesc operand specifies the type of the import. The typeidx operand is an index into the function signature table. The tabletype, memtype, and globaltype operands specify the type of the table, memory, or global being imported.

โœ”๏ธ Function section

The function section defines the function bodies for the functions in the module. The function section is encoded as follows:

funcsec ::= xโ€™03โ€™ vec(typeidx)

The funcsec opcode is encoded as 0x03. The typeidx operand is an index into the function signature table.

โœ”๏ธ Table section

The table section defines the tables used in the module. The table section is encoded as follows:

tablesec ::= xโ€™04โ€™ vec(tabletype)
tabletype ::= xโ€™70โ€™ limits elemtype
limits ::= u32 u1 // initial, maximum?
elemtype ::= xโ€™70โ€™ // funcref

The tablesec opcode is encoded as 0x04. The tabletype opcode is encoded as `0x70, which specifies the type of the table. The limitsoperand specifies the initial and maximum size of the table, with the maximum size being optional. Theelemtypeoperand specifies the type of elements stored in the table, which in the case of WebAssembly is always0x70` for function references.

โœ”๏ธ Memory section

The memory section defines the memories used in the module. The memory section is encoded as follows:

memsec ::= xโ€™05โ€™ vec(memtype)
memtype ::= xโ€™40โ€™ limits

The memsec opcode is encoded as 0x05. The memtype opcode is encoded as 0x40, which specifies the type of the memory. The limits operand specifies the initial and maximum size of the memory, with the maximum size being optional.

โœ”๏ธ Global section

The global section defines the global variables used in the module. The global section is encoded as follows:

globalsec ::= xโ€™06โ€™ vec(global)
global ::= globaltype init
globaltype ::= xโ€™7fโ€™ valtype mut
init ::= expr
mut ::= xโ€™00โ€™ // immutable
| xโ€™01โ€™ // mutable

The globalsec opcode is encoded as 0x06. The global operand specifies the type and initial value of a global variable. The globaltype opcode is encoded as 0x7f, which specifies the type of the global variable. The init operand specifies the initial value of the global variable, which is an expression that is evaluated at initialization time. The mut operand specifies whether the global variable is mutable or immutable.

โœ”๏ธ Export section

The export section defines the functions, tables, memories, and globals that are exported from the module. The export section is encoded as follows:

exportsec ::= xโ€™07โ€™ vec(export)
export ::= name exportdesc
exportdesc ::= xโ€™00โ€™ funcidx // function export
| xโ€™01โ€™ tableidx // table export
| xโ€™02โ€™ memidx // memory export
| xโ€™03โ€™ globalidx // global export

The exportsec opcode is encoded as 0x07. The export operand specifies the name and type of the export. The name operand is a string that specifies the name of the export. The exportdesc operand specifies the type of the export and the index of the function, table, memory, or global being exported.

โœ”๏ธ Start section

The start section specifies the index of the function that is executed when the module is loaded. The start section is encoded as follows:

startsec ::= xโ€™08โ€™ funcidx

The startsec opcode is encoded as 0x08. The funcidx operand specifies the index of the function that is executed when the module is loaded.

โœ”๏ธ Element section

The element section defines the initial contents of the table. The element section is encoded as follows:

elemsec ::= xโ€™09โ€™ vec(elem)
elem ::= tableidx offset vec(funcidx)

The elemsec opcode is encoded as 0x09. The elem operand specifies the index of the table, the offset at which the elements are stored, and the function references to be stored in the table.

โœ”๏ธ Code section

The code section defines the function bodies for the functions in the module. The code section is encoded as follows:

codesec ::= xโ€™0aโ€™ vec(code)
code ::= u32 locals* expr
locals ::= u32 valtype
expr ::= instr*

The codesec opcode is encoded as 0x0a. The code operand specifies the size of the function body in bytes, followed by a vector of locals and the function body expressed as an expr which is a sequence of instructions. The locals vector specifies the types and counts of the local variables used in the function body.

Example of WebAssembly Module

Let's take an example to understand how a WebAssembly module looks like. Consider the following simple program that calculates the sum of two numbers.

(module
 (func $add (param $a i32) (param $b i32) (result i32)
  (local $sum i32)
  (set_local $sum (i32.add (get_local $a) (get_local $b)))
  (get_local $sum))
 (export "add" (func $add)))

This program defines a function add that takes two integer parameters and returns the sum of those parameters. The function is exported so that it can be accessed from outside the module.

The binary format for this module would be as follows:

00 61 73 6d 01 00 00 00 01 07 01 60 02 7f 7f 01
7f 03 02 01 00 07 07 01 03 61 64 64 00 00 0a 09
01 07 00 20 00 20 01 6a 0b 00 20 00 02 00 10 01
06 01 00 41 01 0b 00

Let's break down the binary format into its constituent parts and understand how it represents the WebAssembly module.

The first 8 bytes 00 61 73 6d 01 00 00 00 represent the module header. The first four bytes 00 61 73 6d represent the string "asm" in ASCII. The next four bytes 01 00 00 00 represent the version of the module.

The next byte 01 represents the number of sections in the module. In this case, there is only one section.

The next byte 07 represents the section type, which is the function section. The byte 01 following the section type represents the number of functions defined in the module.

The next byte 60 represents the function type. The next three bytes 02 7f 7f represent the types of the function parameters, which are both of type i32. The byte 01 following the parameter types represents the number of results returned by the function, which is i32.

The next byte 03 represents the number of sections in the module. The byte 02 following the section type represents the number of functions that are exported from the module. The byte 01 following the function index represents the index of the function that is exported.

The next byte 00 represents the end of the module.

Conclusion ๐Ÿš€

WebAssembly has emerged as a powerful technology for developing high-performance, platform-independent applications. The WebAssembly binary format plays a critical role in enabling this technology, by providing a compact, efficient, and portable representation of the code that can be executed in a variety of environments.

In this blog post, we have covered the various components of the WebAssembly binary format, including the module header, sections, types, imports, exports, functions, tables, memories, globals, and instructions. We have also provided an example of a WebAssembly module and shown how its binary format can be decoded to understand its structure.

By understanding the WebAssembly binary format, developers can gain a deeper insight into how WebAssembly works and can create, manipulate, and optimize Web.

ย