Introduction ๐ก
WebAssembly (WASM) is a binary instruction format that is designed to be a portable and efficient compilation target for a wide range of programming languages. Unlike JavaScript, which is an interpreted language, WebAssembly is a low-level language that is designed to be executed by a virtual machine. This makes it possible to execute code written in other languages, such as C and C++, in a web browser.
In this blog post, we will explore the WebAssembly binary format in detail. We will discuss the structure of a WebAssembly module, the types of instructions that can be encoded in WebAssembly, and the encoding of those instructions in the binary format. We will also look at examples of WebAssembly code and examine how they are encoded in the binary format.
Structure of a WebAssembly module
A WebAssembly module is composed of several sections, each of which serves a different purpose. The following sections are defined in the WebAssembly specification:
Type section
Import section
Function section
Table section
Memory section
Global section
Export section
Start section
Element section
Code section
Data section
Data count Section
Let's examine each of these sections in detail.
โ๏ธ Type section
The type section defines the function signature of each function in the module. The function signature consists of the function's parameter types and return type. The type section is encoded as follows:
typesec ::= xโ01โ vec(func_type)
func_type ::= xโ60โ typeidx*
The typesec
opcode is encoded as 0x01
. The func_type
opcode is encoded as 0x60
. The typeidx
operand is an index into the function signature table. The typeidx
operand is encoded using the variable-length encoding scheme described in the next section.
โ๏ธ Import section
The import section defines the functions, tables, memories, and globals that are imported into the module from other modules. The import section is encoded as follows:
importsec ::= xโ02โ vec(import)
import ::= modname name importdesc
modname ::= name
name ::= string
importdesc ::= xโ00โ typeidx // function import
| xโ01โ tabletype // table import
| xโ02โ memtype // memory import
| xโ03โ globaltype // global import
The importsec
opcode is encoded as 0x02
. The modname
operand is the name of the module from which the function is imported. The name
operand is the name of the function, table, memory, or global being imported. The importdesc
operand specifies the type of the import. The typeidx
operand is an index into the function signature table. The tabletype
, memtype
, and globaltype
operands specify the type of the table, memory, or global being imported.
โ๏ธ Function section
The function section defines the function bodies for the functions in the module. The function section is encoded as follows:
funcsec ::= xโ03โ vec(typeidx)
The funcsec
opcode is encoded as 0x03
. The typeidx
operand is an index into the function signature table.
โ๏ธ Table section
The table section defines the tables used in the module. The table section is encoded as follows:
tablesec ::= xโ04โ vec(tabletype)
tabletype ::= xโ70โ limits elemtype
limits ::= u32 u1 // initial, maximum?
elemtype ::= xโ70โ // funcref
The tablesec
opcode is encoded as 0x04
. The tabletype
opcode is encoded as `0x70, which specifies the type of the table. The
limitsoperand specifies the initial and maximum size of the table, with the maximum size being optional. The
elemtypeoperand specifies the type of elements stored in the table, which in the case of WebAssembly is always
0x70` for function references.
โ๏ธ Memory section
The memory section defines the memories used in the module. The memory section is encoded as follows:
memsec ::= xโ05โ vec(memtype)
memtype ::= xโ40โ limits
The memsec
opcode is encoded as 0x05
. The memtype
opcode is encoded as 0x40
, which specifies the type of the memory. The limits
operand specifies the initial and maximum size of the memory, with the maximum size being optional.
โ๏ธ Global section
The global section defines the global variables used in the module. The global section is encoded as follows:
globalsec ::= xโ06โ vec(global)
global ::= globaltype init
globaltype ::= xโ7fโ valtype mut
init ::= expr
mut ::= xโ00โ // immutable
| xโ01โ // mutable
The globalsec
opcode is encoded as 0x06
. The global
operand specifies the type and initial value of a global variable. The globaltype
opcode is encoded as 0x7f
, which specifies the type of the global variable. The init
operand specifies the initial value of the global variable, which is an expression that is evaluated at initialization time. The mut
operand specifies whether the global variable is mutable or immutable.
โ๏ธ Export section
The export section defines the functions, tables, memories, and globals that are exported from the module. The export section is encoded as follows:
exportsec ::= xโ07โ vec(export)
export ::= name exportdesc
exportdesc ::= xโ00โ funcidx // function export
| xโ01โ tableidx // table export
| xโ02โ memidx // memory export
| xโ03โ globalidx // global export
The exportsec
opcode is encoded as 0x07
. The export
operand specifies the name and type of the export. The name
operand is a string that specifies the name of the export. The exportdesc
operand specifies the type of the export and the index of the function, table, memory, or global being exported.
โ๏ธ Start section
The start section specifies the index of the function that is executed when the module is loaded. The start section is encoded as follows:
startsec ::= xโ08โ funcidx
The startsec
opcode is encoded as 0x08
. The funcidx
operand specifies the index of the function that is executed when the module is loaded.
โ๏ธ Element section
The element section defines the initial contents of the table. The element section is encoded as follows:
elemsec ::= xโ09โ vec(elem)
elem ::= tableidx offset vec(funcidx)
The elemsec
opcode is encoded as 0x09
. The elem
operand specifies the index of the table, the offset at which the elements are stored, and the function references to be stored in the table.
โ๏ธ Code section
The code section defines the function bodies for the functions in the module. The code section is encoded as follows:
codesec ::= xโ0aโ vec(code)
code ::= u32 locals* expr
locals ::= u32 valtype
expr ::= instr*
The codesec
opcode is encoded as 0x0a
. The code
operand specifies the size of the function body in bytes, followed by a vector of locals
and the function body expressed as an expr
which is a sequence of instructions. The locals
vector specifies the types and counts of the local variables used in the function body.
Example of WebAssembly Module
Let's take an example to understand how a WebAssembly module looks like. Consider the following simple program that calculates the sum of two numbers.
(module
(func $add (param $a i32) (param $b i32) (result i32)
(local $sum i32)
(set_local $sum (i32.add (get_local $a) (get_local $b)))
(get_local $sum))
(export "add" (func $add)))
This program defines a function add
that takes two integer parameters and returns the sum of those parameters. The function is exported so that it can be accessed from outside the module.
The binary format for this module would be as follows:
00 61 73 6d 01 00 00 00 01 07 01 60 02 7f 7f 01
7f 03 02 01 00 07 07 01 03 61 64 64 00 00 0a 09
01 07 00 20 00 20 01 6a 0b 00 20 00 02 00 10 01
06 01 00 41 01 0b 00
Let's break down the binary format into its constituent parts and understand how it represents the WebAssembly module.
The first 8 bytes 00 61 73 6d 01 00 00 00
represent the module header. The first four bytes 00 61 73 6d
represent the string "asm" in ASCII. The next four bytes 01 00 00 00
represent the version of the module.
The next byte 01
represents the number of sections in the module. In this case, there is only one section.
The next byte 07
represents the section type, which is the function section. The byte 01
following the section type represents the number of functions defined in the module.
The next byte 60
represents the function type. The next three bytes 02 7f 7f
represent the types of the function parameters, which are both of type i32
. The byte 01
following the parameter types represents the number of results returned by the function, which is i32
.
The next byte 03
represents the number of sections in the module. The byte 02
following the section type represents the number of functions that are exported from the module. The byte 01
following the function index represents the index of the function that is exported.
The next byte 00
represents the end of the module.
Conclusion ๐
WebAssembly has emerged as a powerful technology for developing high-performance, platform-independent applications. The WebAssembly binary format plays a critical role in enabling this technology, by providing a compact, efficient, and portable representation of the code that can be executed in a variety of environments.
In this blog post, we have covered the various components of the WebAssembly binary format, including the module header, sections, types, imports, exports, functions, tables, memories, globals, and instructions. We have also provided an example of a WebAssembly module and shown how its binary format can be decoded to understand its structure.
By understanding the WebAssembly binary format, developers can gain a deeper insight into how WebAssembly works and can create, manipulate, and optimize Web.