diff --git a/content/10.zk-stack/10.components/70.compiler/20.specification/10.index.md b/content/10.zk-stack/10.components/70.compiler/20.specification/10.index.md index 17e3a13f..81f7ffc1 100644 --- a/content/10.zk-stack/10.components/70.compiler/20.specification/10.index.md +++ b/content/10.zk-stack/10.components/70.compiler/20.specification/10.index.md @@ -38,3 +38,4 @@ please visit [Toolchain](/zk-stack/components/compiler/toolchain) to understand - [System Contracts](/zk-stack/components/compiler/specification/system-contracts) - [Exception Handling](/zk-stack/components/compiler/specification/exception-handling) - [EVMLA translator](/zk-stack/components/compiler/specification/evmla-translator) +- [Binary layout, linking and loading](/zk-stack/components/compiler/specification/binary-layout) diff --git a/content/10.zk-stack/10.components/70.compiler/20.specification/70.binary-layout.md b/content/10.zk-stack/10.components/70.compiler/20.specification/70.binary-layout.md new file mode 100644 index 00000000..4958704a --- /dev/null +++ b/content/10.zk-stack/10.components/70.compiler/20.specification/70.binary-layout.md @@ -0,0 +1,361 @@ +--- +title: EraVM Binary Layout +description: How assembler listing looks like and how it is transformed into a binary file, sent to the chain. +--- + +## Definitions + +- A directive is a command issued to the assembler, which is not translated into +an executable bytecode instruction. +Their names start with a period, for example, `.cell`. +Directives are used to regulate the translation process. +- An instruction constitutes the smallest executable segment of bytecode. +In EraVM, each instruction is exactly eight bytes long. +- A word is a 256-bit unsigned integer in a big-endian format. + +## Structure of assembly file + +This section describes the structure of an EraVM assembly file, a text file +typically with the extension `.zasm`. + +### Data types + +- `U256` – word, a 256-bit unsigned integer number, big-endian. +- `U16` – 16-bit unsigned integer number, big-endian. + +### Sections + +The source code within an EraVM assembly is organized into distinct +sections. The start of a section is denoted by one of the following +directives: + +- `.rodata` – constant, read-only data. +- `.data` – global mutable data. +- `.text` – executable code. + +Additional sections may be implemented in the future. + +The description of any section may be spread across the file: + +```asm +.rodata + .cell 0 +.text + +.rodata + .cell 1 +``` + +In this example, multiple `.rodata` sections appear, but in the resulting binary +file they will be merged into a single contiguous region of memory. +Same principle applies to other sections. + +### Defining data + +The `.cell` directive defines data: + +```asm +.rodata + .cell 0 + .cell 23090 +.data + .cell 1213 +``` + +- Note: using `.cell` in the `.data` section is deprecated and will not be supported in the future versions of assembly. +- The value of cell is provided as a signed 256-bit decimal number. +- Negative numbers will be encoded as 256-bit 2’s complement, e.g. `-1` is encoded as `0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff`. +- An optional `+` sign before positive numbers is allowed, e.g. `.cell +123`. +- Hexadecimal integer literals are not supported. +- Symbols (names of labels) are supported, for example: + +```asm +.text + +f: + add r0, r0, r0 + +g: + add r0, r0, r0 + +.rodata + +my_cells: + .cell @lab1 + .cell @lab2 + .cell -1 +``` + + Note the `@` prefixing the label name. + + Each `.cell` is 256-bit wide, even though an address such as `@lab1` or `@lab2` is just 16-bit wide. + Addresses are padded with zeroes to fit in the word. + +### Overall structure + +The structure of an assembly file is described as follows: + +```ebnf + :=
* + +
:= + | ".rodata" * + | ".data" * + | ".text" * + + :=