For this deconstruction, we will be researching fib.so
, a hand-rolled sBPF assembly program that calculates the fibonacci number based upon a u8
input.
The structure of an sBPF program has 4 sections:
The starting point of the file, describing the overall file format, target environment, and offsets for program and section headers.
Define the memory segments and their attributes (readable, writable, executable) for runtime execution.
Contain the actual program content, such as:
.text
for executable code.data
for writable data.rodata
for read-only data
Metadata explaining each section, used during the linking process, and/or for debugging purposes.
Below is the elf header for fib.so
ELFHeader {
ei_magic: [127, 69, 76, 70,],
ei_class: 2,
ei_data: 1,
ei_version: 1,
ei_osabi: 0,
ei_abiversion: 0,
ei_pad: [0, 0, 0, 0, 0, 0, 0],
e_type: 3,
e_machine: 247,
e_version: 1,
e_entry: 232,
e_shoff: 872,
e_phoff: 64,
e_flags: 0,
e_ehsize: 64,
e_phentsize: 56,
e_phnum: 3,
e_shentsize: 64,
e_shnum: 8,
e_shstrndx: 7,
}
To create an ELF header of our own from scratch, we must dynamically update the following fields with their correct values/offsets:
e_entry
- the offset of our entrypoint, defined by our entrypoint symbole_shoff
- the offset of our section headerse_phnum
- the number of our program headers (Min 1, typically 3)e_shnum
- the number of section headers - (Min 3 - Null, Progbits, Strtab, typically more)e_shstrndx
- index of Strtab in our section header - (Min 2 - Null/Progbits must preceed Strtab)
Program headers define how sections of an ELF file are mapped into memory, as well as access control over these memory regions (e.g., read, write, execute). These headers are used by the system loader to set up the program's memory image when executing the binary.
- Readable-Executable Program Header
- Read-only Program Header
- Dynamic Program Header
NOTE: For a program with no dynamically linked symbols (e.g. syscalls), it is possible to reduce this to just a single Readable-Execute program header.
The Read-Execute header points to the offset our .text
section containing our entrypoint, and also encapsulates our .rodata
section:
/// 1. Read-execute program header of fib.so
ProgramHeader {
p_type: PT_LOAD,
p_flags: ProgramFlags(5), // Read-Execute
p_offset: 232,
p_vaddr: 232,
p_paddr: 232,
p_filesz: 232,
p_memsz: 232,
p_align: 4096,
},
In this case, the p_filesz
and p_memsz
matching the offset value of 232
is purely a coincidence. These values are not identical. Our .text
section is 200 bytes in length, and our .rodata
section is 32 bytes in length, which just so happens to equal our offset value of 232
.
If our program contains any dynamically linked symbols, we must include a read-only program header:
/// 1. Readonly program headers of fib.so
let readonly_header = ProgramHeader {
p_type: PT_LOAD,
p_flags: ProgramFlags(4), // Readonly
p_offset: 640,
p_vaddr: 640,
p_paddr: 640,
p_filesz: 168,
p_memsz: 168,
p_align: 4096,
}
The offset 640
points to the start of our .dynsym
, section defined in our SHT_DYNSYM
Section Header. The .dynsym
section is a subset of the .symtab
symbol table, containing only the symbols needed for dynamic linking.
If our program contains any dynamically linked symbols, we must include a dynamic program header:
let read_only_header = ProgramHeader {
p_type: PT_DYNAMIC,
p_flags: ProgramFlags(6), // Read-write
p_offset: 464,
p_vaddr: 464,
p_paddr: 464,
p_filesz: 176,
p_memsz: 176,
p_align: 8,
}
The offset 464
points to the start of our .dynamic
, section defined in our SHT_DYNAMIC
Section Header. The .dynamic section acts as a metadata table for dynamic linking.
We must dynamically update the following fields of each program header based upon their relative offsets and sizes in the binary
p_offset
- Offset of the segment in the file imagep_vaddr
- Virtual address of the segmentp_paddr
- Physical address of the segmentp_filesz
- Size in bytes of the segment in the file imagep_memsz
- Size in bytes of the segment in the memory
The bare minimum for sBPF sections headers is:
null
Section Header.text
Section Header.shstrtab
Section Header
This is because null
is required by solana-rbpf
, .text
contains our executable code, and .shstrtab
contains the symbol name of our entrypoint.
Our fib.so
includes several other headers as detailed below.
For absolutely no good reason, our first Section Header must always be null
. While not actually a requirement of eBPF, it is a requirement of sBPF, as it inherits the quirks of rBPF which inherits the quirks uBPF which decided to treat the output of the GNU linker, (which happens to include a Null Section Header when it packages ELF binaries, again for no reason) as the "standard" implementation of the eBPF specification.
SectionHeader {
sh_name: 0, // \0
sh_type: SHT_NULL,
sh_flags: 0,
sh_addr: 0,
sh_offset: 0,
sh_size: 0,
sh_link: 0,
sh_info: 0,
sh_addralign: 0,
sh_entsize: 0,
}
Our second section header must be our .text
section. This contains our executable code.
SectionHeader {
sh_name: 1, // .text
sh_type: SHT_PROGBITS,
sh_flags: 6,
sh_addr: 232,
sh_offset: 232,
sh_size: 200,
sh_link: 0,
sh_info: 0,
sh_addralign: 4,
sh_entsize: 0,
}
TODO: Figure out why sh_addralign
is aligned to 4 and not 8 or 1.
Our third section header points to our .rodata
. This contains the readonly data we use when we invoke _sol_log
, namely:
"Sorry, u64 maxes out at F(93) :("
This data would need to be properly aligned to 8 bytes if it was not alreadt 32 bytes in length.
SectionHeader {
sh_name: 7, // .rodata
sh_type: SHT_PROGBITS,
sh_flags: 2,
sh_addr: 432,
sh_offset: 432,
sh_size: 32,
sh_link: 0,
sh_info: 0,
sh_addralign: 1,
sh_entsize: 0,
}
Our dynamic header points to this:
DT_FLAGS - DF_TEXTREL - indicating text relocation is required
1e00000000000000 0400000000000000
DT_REL - offset of .rel.dyn
1100000000000000 f802000000000000
DT_RELSZ - size of DT_REL table (46 bytes)
1200000000000000 3000000000000000
DT_RELENT - size of the relocation entry (16 bytes)
1300000000000000 1000000000000000
DT_RELCOUNT - relative relocation count (1)
faffff6f00000000 0100000000000000
DT_SYMTAB - offset of .dynsym (640)
0600000000000000 8002000000000000
DT_SYMENT - size of DT_SYMTAB symbol entry (24 bytes)
0b00000000000000 1800000000000000
DT_STRTAB - offset of the string table .dynstr (736)
0500000000000000 e002000000000000
DT_STRSZ - size of the DT_STRTAB (24 bytes)
0a00000000000000 1800000000000000
DT_TEXTREL // One or more relocation entries might request modifications to a non-writable segment.
1600000000000000 0000000000000000
DT_NULL
0000000000000000 0000000000000000
SectionHeader {
sh_name: 15, // .dynamic
sh_type: SHT_DYNAMIC,
sh_flags: 3,
sh_addr: 464,
sh_offset: 464,
sh_size: 176,
sh_link: 5,
sh_info: 0,
sh_addralign: 8,
sh_entsize: 16,
}
A dynamic symbol table. Each entry looks like this:
pub struct DynSym {
pub st_name; // Symbol name (offset in .dynstr)
pub st_info: u8; // Symbol type and binding
pub st_other: u8; // Symbol visibility
pub st_shndx: u16; // Section header index related to this object
pub st_value: u64; // Symbol value
pub st_size: u64; // Symbol size
}
impl DymSym {
pub const fn st_bind(&self) -> u8 {
self.st_info >> 4
}
pub const fn st_type(&self) -> u8 {
self.st_info & 0x0f
}
pub const fn st_visibility(&self) -> u8 {
self.st_other & 0x03
}
}
This section of our program contains:
null 00000000 00 00 0000 0000000000000000 0000000000000000
e, STB_GLOBAL, header 1 01000000 10 00 0100 e800000000000000 0000000000000000
sol_log_, STB_GLOBAL 03000000 10 00 0000 0000000000000000 0000000000000000
sol_log_64_, STB_GLOBAL 0c000000 10 00 0000 0000000000000000 0000000000000000
SectionHeader {
sh_name: 24, // .dynsym
sh_type: SHT_DYNSYM,
sh_flags: 2,
sh_addr: 640,
sh_offset: 640,
sh_size: 96,
sh_link: 5,
sh_info: 1,
sh_addralign: 8,
sh_entsize: 24,
}
This links to all of our dynamic symbols, separated by zero bytes. It contains:
e sol_log_ sol_log_64_
SectionHeader {
sh_name: 32, // .dynstr
sh_type: SHT_STRTAB,
sh_flags: 2,
sh_addr: 736,
sh_offset: 736,
sh_size: 24,
sh_link: 0,
sh_info: 0,
sh_addralign: 1,
sh_entsize: 0,
}
The .rel.dyn
contains a list of relocations and their types. In this case, we have two types:
0x08
A relative relocation0x0a
A syscall relocation
7001000000000000 - Offset 368/0x0107 - points to jump to finalize which calls sol_log_64_
0800000000000000 - R_SBF_64_RELATIVE
9001000000000000 - 400 - points to call sol_log_
0a000000 - "Syscall" reallocation type
02000000 - Points to .dynstr entry 2, sol_log_
a001000000000000 - 416 - points to call sol_log_64_
0a000000 - "Syscall" reallocation type
03000000 - Points to .dynstr entry 3, sol_log_
SectionHeader {
sh_name: 40,
sh_type: SHT_REL,
sh_flags: 2,
sh_addr: 760,
sh_offset: 760,
sh_size: 48,
sh_link: 4,
sh_info: 0,
sh_addralign: 8,
sh_entsize: 16,
}
Contains our section header string table, as follows:
.text .rodata .dynamic .dynsym .dynstr .rel.dyn .shstrtab
SectionHeader {
sh_name: 49,
sh_type: SHT_STRTAB,
sh_flags: 0,
sh_addr: 0,
sh_offset: 808,
sh_size: 59,
sh_link: 0,
sh_info: 0,
sh_addralign: 1,
sh_entsize: 0,
}