HRAM0 | Model and Toolchain for Research in Memory Safety

Specification

Last Updated:

Instruction Set

We use the term "register" to refer to units of "working memory". The set of registers is a small finite set of storage locations, labelled r1, r2, r3... etc, and which are completely distinct from locations in memory. The latter is split into a code segment (where instructions are stored) and a data segment which consists of an unbounded number of locations of unbounded size, addressed starting from 0.

The instruction set is as follows:

Operation Mnemonic Opcode Description
Halt HLT 0 Stop
Put PUT <constant>, <register> 1 Put constant value in a register
Add ADD <register>, <register>, <register> 2 Add first two registers and store the result in the third register
Subtract SUB <register>, <register>, <register> 3 Subtract first register from second register and store the result in the third register
Load LOD <register>, <register> 4 Load word from memory (data segment) at address given by first register into second register
Store STO <register>, <register> 5 Store word from first register to memory (data segment) address given by second register
Branch if negative BRN <register>, <constant> 6 If contents of first register is negative, jump to non-negative constant address (code segment) given as the second operand; ELSE next instruction
Call subroutine CAL <constant> 7 Jump to non-negative constant address (code segment) and record address of next instruction which can be returned to with RET (see next)
Return from subroutine RET 8 Return to the instruction following most recent CAL. If no such CAL, then halt.
Allocate (malloc) MAL <register>, <register> 9 Allocate block of memory of size (in words) given by first register and store the address in the second register
Deallocate (free) FRE <register> 10 Free block of memory that starts at the address given by the first register

Indirect branching is not supported so a constant address must be specified as the target of branches (BRN) and function calls (CAL). The target address is an address in the code segment which contains instructions and which is read-only. To load/store data from the data segment, the desired address is first put in a register and this register is passed to the load/store instruction. For example, the code below loads the value at address 2 in the data segment into the register r1, using r0 to hold the address.

put 2, r0
lod r0, r1

The standard encoding of instructions we use favors simplicity over compression, using a distinct integer to encode the opcode and every operand. Therefore, the first instruction in the code above would be encoded as 1, 2, 0 and the second as 4, 0, 1, each occupuing 3 locations in memory. We use the term "word" to refer to such storage locations, albeit no "word" size is specified, meaning their capacity is in theory unbounded and leaving it up to implementations to impose a concrete size.

The formal specification document for the HRAM0 model is available here which includes the formal semantics.

Provisionally, the extension of the instruction set to include the instructions MUL (integer multiplication) and DIV (integer division) is dubbed HRAM0x. This extension requires some non-trivial changes such as an additional halting state triggered on division by zero that is distinct from the ERROR state reserved for memory safety violations.