Universal Screeners

Last Updated: Tuesday 9th August 2022

Construction of Universal Screener

A prerequisite for reading this page is the preceeding article in this series - characterization of memory safety. Recall that a screener is a transformation that is both an instrumentor and one that is strenghtening. Furthermore, a screener is universal if it checks every memory access for safety. The asymptotic complexity in terms of number of memory accesses of a transformed program $P' = T(P)$ is linearly dependent on $m_P(n)$ where $T$ is a universal screener and $P \in \mathbb{P}$ is a valid program where $m_P(n)$ is the (average) number of memory accesses made by program P on inputs of length n. The complexity also depends on another important factor: the data structure used to keep track of valid allocated blocks. Such "bookkeeping" is indispensible and plays an important role in any realization of a screener.

Constructing a universal screener for HHRAM0 programs involves some technicalities, which we discuss in turn. Firstly, we introduce some useful macros to serve our needs.

Preliminaries

We now define two important macros which we make ample use of later. They relate to manipulation of a stack, namely the push and pop operations. Both macros assume the stack pointer is stored in register $\mathsf{r1}$ and that the register $\mathsf{r2}$ contains the constant -1.

push $z$ :
- sto $z, \mathsf{r1}$ // *sp := z
- add $\mathsf{r2}, \mathsf{r1}, \mathsf{r1}$ // sp := sp - 1
pop $z$ :
- sub $\mathsf{r2}, \mathsf{r1}, \mathsf{r1}$ // sp := sp + 1
- lod $\mathsf{r1}, z$ // z := *sp
popx:
- sub $\mathsf{r2}, \mathsf{r1}, \mathsf{r1}$ // sp := sp + 1

Consider the following prepend transformation $\mathsf{TPre}$ :

$\mathbf{TPre}_{P_{\mathsf{pre}}}(P)$ :
- Parss $(\mathbf{c}_{\mathsf{pre}}, \mathbf{d}_{\mathsf{pre}}) \gets P_{\mathsf{pre}}$
- Parse $(\mathbf{c}, \mathbf{d}) \gets P$
- Return $P' := (\mathbf{c}_{\mathsf{pre}} \parallel \mathbf{c}, \mathbf{d}_{\mathsf{pre}} \parallel \mathbf{d})$

Furthermore, consider the following setup program $P_\mathsf{setup}$ :

Program $P_{\mathsf{setup}}$ :

start:
    put T, r2 // where T > k
    mal r2, r3 // system memory
    put S, r2 // where S is stack size
    mal r2, r1
    sto r1, r3 // save stack block starting address in system memory
    add r2, r1, r1 // r1 is now stack pointer
    put -1, r2 // this convention is followed unvaryingly
    add r2, r1, r1 // subtract 1 from stack pointer to correct start of stack

Instrumentation-Friendly Programs

In the discussion to follow, our definitions rely heavily on the notion of program equivalence. However, the astute reader may observe that the equivalences do not hold when employing our default filter function, defined in the previous article. Therefore, we must specify a filter function that is of import here and which is to be implicitly understood as being applicable in all uses of program equivalence for the remainder of this article. This function $f^\ast : \mathcal{P}(\mathbb{Z} \times \mathbb{Z}) \to \mathcal{P}(\mathbb{Z} \times \mathbb{Z})$ is defined thus $$f^\ast(M) = \{(i, M(i)) : i \geq 0, \bigwedge_{j = 0}^{i} M(j)\}$$ and truncates a memory relation to the lower contiguous range of addreses, consisting of the static data together with the program input. As such, the heap; that is, the dynamically-allocated blocks are ignored (note that owing to the fact that $\zeta > 0$ it means a "gap" separates the defined lower memory from the dynamically-allocated blocks). Henceforth, for the remainder of this article, the default filter function is understood to be $f^\ast$ .

A spurious program is one that accesses memory at one or more absolute addresses; that is, an address that is not relative to any of the blocks the program has allocated, excepting of course the region of statically-allocated data and program input. Our instrumentation approach fails with spurious programs because we cannot necessarily preserve their functionality when modifying the program to keep track of safe/unsafe memory regions, since allocating space for a runtime management system affects the program's behavior. A program $P$ is non-spurious iff $\mathsf{TPre}_{P_{\mathsf{mal}, u}}(P) \equiv P$ for any choice of $u$ where $P_{\mathsf{mal}, u}$ is defined thus

Program $P_{\mathsf{mal}, u}$ :

start:
    put u, r0 
    mal u, r0 // alloacate region of size u
    put 0, r0 // restore r0 to 0 (so appended program gets usual environment)

A program is said to be reflective if the program counter (PC) register is an operand of one or more of its instructions. It is difficult for instrumentors to preserve the functionality of reflective programs. Let $\mathsf{nr} : \mathbb{P} \to \{0, 1\}$ be a predicate that determines whether a program is non-reflective. It is easy to see that such a predicate is efficiently computable.

We term the set of programs $\tilde{\mathbb{P}} := \{P \in \mathbb{P} : \mathsf{nr}(P) \; \land \; P \equiv \mathsf{TPre}_{P_{\mathsf{mal}, u}}(P),\; \forall u \in \mathbb{Z}_{+}\} $ instrumentation-friendly programs; that is, programs that are both non-spurious and non-reflective, and we preclude such programs from consideratoin.

Note that the concatentaion operation $\parallel$ is understood to intelligently produces a resultant program such that all branches/calls to targets in the second program (right operand) are translated by the size of the first program (left operand).

TIncRegs Transformation

In this section we construct a transformation that increments all register numbers used in a program by some specified $k$ so that the resulting program does not disturb the lower $k$ registers ( $\mathsf{r0}, \mathsf{r1}, \ldots$ etc.) which are reserved for special purposes such as the stack pointer. We now describe how this transformation works. Firstly, it requires that $k \geq 4$ and the first $k$ registers are setup as follows as a prerequisite for the correctness of the program: $\mathsf{r3}$ is a pointer to a block of memory of at least $k + 1$ words, $\mathsf{r1}$ is the stack pointer and $\mathsf{r2}$ contains the constant -1 (see PUSH/POP macros above). Code is inserted to multiplex the contents of $\mathsf{r\rho - 1}$ (concretely r13 when $\rho = 14$ ) between $k + 1$ different values which are stored in the block of memory pointed to by $\mathsf{r3}$ and swapped in/out as required.

Now we present the $\mathsf{TIncRegs}_k$ transformation.

$\mathbf{TIncRegs}_k(P)$ :

$(\mathbf{c}, \mathbf{d}) \gets P$
$\mathbf{c'} \gets \epsilon$
$\mathsf{sel} \gets 0$
For each instruction $\mathsf{instr} \in P$ :
- $\hat{\mathbf{c}} \gets \epsilon$
- $\mathsf{last} \gets 0$
- $\mathsf{start\_index} \gets 0$
- If $\mathsf{instr.opcode} = \mathsf{PUT}$ or $\mathsf{instr.opcode} = \mathsf{CAL}$ :
- - $\mathsf{start\_index} \gets 1$
- $N \gets \alpha(\mathsf{instr.opcode})$
- If $\mathsf{instr.opcode} = \mathsf{BRN}$
- - $N \gets N - 1$
- For $\mathsf{start\_index} \leq i < N$ :
- - $r \gets \mathsf{instr.operands}[i]$
- - If $r < \rho \; \land \; (r + k) \geq \rho - 1$ :
- - - $t \gets (r + k) - (\rho - 1)$
- - - $r = \rho - 1$
- - - If $t \neq \mathsf{sel}$ :
- - - - Choose some register $s \neq r$ and $s \geq k$
- - - - If $i < \alpha(\mathsf{instr.opcode}) - 1 \; \land \; \mathsf{last} = 1$ :
- - - - Choose some register $r' \neq r$ and $r' \neq \mathsf{instr.operands}[i + 1]$ and $r' \geq k$
- - - - Append the following code to $\mathbf{c'}$
        push r' push s put t, s add s, r3, s sub r2, s, s // skip stack address (s += 1) lod s, r' pop s
- - - - $\hat{\mathbf{c}} \gets \mathsf{Assemble}(\text{pop } r')$
- - - - $r \gets r'$
- - - - Else:
- - - - Append the following code to $\mathbf{c'}$
        push s put sel, s // current selection add s, r3, s sub r2, s, s // skip stack address (s += 1) sto r, s // save current value of r (r is register \rho - 1) put t, s // new selection add s, r3, s sub r2, s, s // skip stack address (s += 1) lod s, r // load newly selected value into r (r is register \rho - 1) pop s
- - - - $\mathsf{last} \gets 1$
- - - - $\mathsf{sel} \gets t$
- - Else if $r < \rho$ :
- - - $r \gets r + k$
- - $\mathsf{instr.operands}[i] \gets r$
- Append $\mathsf{instr}$ to $\mathbf{c'}$
- $\mathbf{c'} \gets \mathbf{c'} \parallel \hat{\mathbf{c}}$
Return $P' := (\mathbf{c'}, \mathbf{d})$

An invariant that TIncRegs must necessarily satisfy is as follows. Let $k \geq 4$ be an integer. For all $P \in \tilde{\mathbb{P}}$ it holds that $$ (\mathsf{TPre}_{P_{\mathsf{setup}}} \circ \mathsf{TIncRegs}_k) (P) \equiv P.$$

Address Manager Definition

1. An address manager is a tuple of polynomial-time algorithms $(\mathsf{Init}, \mathsf{Malloc}, \mathsf{Free}, \mathsf{Access}, \mathsf{Finalize})$ defined as follows:

$\mathsf{Init}:$ Takes as input a program in $\tilde{\mathbb{P}}$ (the program to instrument) and a register number between 0 and $\rho$ (containing stack pointer) and a register number between 0 and $\rho$ (to store pointer to address manager program state) and outputs a tuple containing the address manager context and the instrumented code sequence for initialization.
$\mathsf{Malloc}:$ Takes as input address manager context and outputs the instrumented code sequence for a subroutine that is called for memory allocations. This subroutine takes two parameters, the address of the newly allocated block and the number of words allocated.
$\mathsf{Free}:$ Takes as input address manager context and outputs the instrumented code sequence for a subroutine that is called when memory is freed. This subroutine takes one parameter, the starting address of the block to free.
$\mathsf{Check}:$ Takes as input address manager context and outputs the instrumented code sequence for a subroutine that is called for memory address checks. This subroutine takes two parameters that mark the address range to check, namely the starting address (inclusive) and the ending address. The subroutine returns 0 iff the address range is safe and returns -1 iff the address range is unsafe.
$\mathsf{Finalize}:$ Takes as input address manager context and outputs the instrumented code sequence for a subroutine that is called when the program aborts. This subroutine takes no parameters.

The calling convention we use stipulates that arguments are pushed on to the stack from right to left; that is, when calling a function with two arguments, the second argument is pushed first followed by the second etc. The arguments are guaranteed to be popped by the function. Furthermore, the return value is passed in r0.

Now we define the fundamental condition of validity for an instance of address manager.

2. An instance of address manager $\mathcal{AM}$ satisfies the validity condition if and only if

AM.1: $\mathsf{UScreener}(\mathcal{AM}, \cdot) : \tilde{\mathbb{P}} \to \tilde{\mathbb{P}}$ is a strengthening transformation.

where $\mathsf{UScreener}$ is a screener, which will be described now.

UScreener Construction

We now present our construction of a universal screener that makes use of an instance of address manager. The high-level overview of UScreener is as follows. Firstly, it performs initialization. It calls the algorithms of the address manager instance $\mathcal{AM}$ to get the instrumented code for each core subroutine: malloc, free, check and finalize. This collection of subroutines is aggregated and the offsets for each subroutine are stored so that they may easily be invoked. Then a pass is made over every instruction in $P'$ , which is the result of applying the transformation above $\mathsf{TIncRegs}$ to the input program $P$ . If the instruction is a MAL (malloc) or FRE (free), each is is suffixed with a call to the malloc subroutine or free subroutine respectively. If the instruction is HLT, it is prefaced with a call to a cleanup subroutine (which calls the address manager's finalize subroutine). If the instruction is a load/store, code is added to call the address manager's check subroutine and if it returns -1 specific code to fail gracefully is executed whereas if it returns 0, then the load/store instruction is added. Finally, any other instruction is appended to the resulting code unmodified. The key point to remember is that this is a universal screener and thus, every load/store is checked for safety.

$\mathbf{UScreener}$ ( $\mathcal{AM}$ , P);

$\mathbf{c'} \gets \epsilon$
Parse $(\mathbf{c}, \mathbf{d}) \gets P$
$(\hat{\mathbf{c}}, \epsilon) \gets P_{\mathsf{setup}, T:=7}$
$\mathbf{c'} \gets \mathbf{c'} \parallel \hat{\mathbf{c}}$
$P' \gets \mathsf{TIncRegs}(P, 5)$
$(\mathsf{ctx}, \hat{\mathbf{c}}) \gets \mathcal{AM}.\mathsf{Init}(P', \mathsf{r1}, \mathsf{r4})$
$\mathbf{c'} \gets \mathbf{c'} \parallel \hat{\mathbf{c}}$
$\mathsf{malloc\_sub\_offset} \gets |\mathbf{c'}| + 3$ // branch is to be appended (3 words)
$\mathbf{sub} \gets \mathcal{AM}.\mathsf{Malloc}(\mathsf{ctx})$
$\mathbf{subs} \gets \mathbf{sub}$
$\mathsf{free\_sub\_offset} \gets \mathsf{malloc\_sub\_offset} + |\mathbf{sub}|$
$\mathbf{sub} \gets \mathcal{AM}.\mathsf{Free}(\mathsf{ctx})$
$\mathbf{subs} \gets \mathbf{subs} \parallel \mathbf{sub}$
$\mathsf{check\_sub\_offset} \gets \mathsf{free\_sub\_offset} + |\mathbf{sub}|$
$\mathbf{sub} \gets \mathcal{AM}.\mathsf{Check}(\mathsf{ctx})$
$\mathbf{subs} \gets \mathbf{subs} \parallel \mathbf{sub}$
$\mathsf{finalize\_sub\_offset} \gets \mathsf{check\_sub\_offset} + |\mathbf{sub}|$
$\mathbf{sub} \gets \mathcal{AM}.\mathsf{Finalize}(\mathsf{ctx})$
$\mathbf{subs} \gets \mathbf{subs} \parallel \mathbf{sub}$
$\mathsf{cleanup\_sub\_offset} \gets \mathsf{finalize\_sub\_offset} + |\mathbf{sub}|$
// Below is the code for cleanup subroutine
$\mathbf{subs} \gets \mathbf{subs} \parallel \mathsf{Assemble}($
- cal $\mathsf{finalize\_sub\_offset}$
- lod $\mathsf{r3}, \mathsf{r1}$ // get start of stack block
- fre $\mathsf{r1}$
- fre $\mathsf{r3}$
- ret
)
$\mathsf{postsubs\_offset} \gets \mathsf{malloc\_sub\_offset} + |\mathbf{subs}|$
$\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- brn $\mathsf{r2}, \mathsf{postsubs\_offset}$ // jump over subroutines
)
$\mathbf{c'} \gets \mathbf{c'} \parallel \mathbf{subs}$
For each instruction $\mathsf{instr} \in P'$ :
- If $\mathsf{instr.opcode} = \text{MAL}$ :
- - $\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- - - push $\mathsf{instr.operands}[0]$
- - )
- - Append $\mathsf{instr}$ to $\mathbf{c'}$
- - $\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- - - push $\mathsf{instr.operands}[1]$
- - - cal $\mathsf{malloc\_sub\_offset}$
- - )
- Else If $\mathsf{instr.opcode} = \text{FRE}$ :
- - Append $\mathsf{instr}$ to $\mathbf{c'}$
- - $\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- - - push $\mathsf{instr.operands}[0]$
- - - cal $\mathsf{free\_sub\_offset}$
- - )
- Else If $\mathsf{instr.opcode} \in \{\text{LOD},\text{STO}\}$ :
- - $j \gets 0$
- - If $\mathsf{instr.opcode} = \text{STO}:$
- - - $j \gets 1$
- - $\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- - - sub $\mathsf{r2}, \mathsf{instr.operands}[j], r0$
- - - push r0 // end address is start address + 1 (exclusive)
- - - push $\mathsf{instr.operands}[j]$ // start address
- - - cal $\mathsf{check\_sub\_offset}$ // check returns -1 if unsafe
- - - brn $\mathsf{r0}, \mathsf{exit\_gracefully}$
- - )
- - Append $\mathsf{instr}$ to $\mathbf{c'}$
- Else If $\mathsf{instr.opcode} = \text{HLT}$ :
- - $\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- - - cal $\mathsf{cleanup\_sub\_offset}$
- - )
- - Append $\mathsf{instr}$ to $\mathbf{c'}$
- Else:
- - Append $\mathsf{instr}$ to $\mathbf{c'}$
$\mathbf{c'} \gets \mathbf{c'} \parallel \mathsf{Assemble}($
- exit_gracefully:
- - cal $\mathsf{cleanup\_sub\_offset}$
- - hlt
)
Return $(\mathbf{c}',\mathbf{d})$

Instances of Address Manager

An address manager can be realized in different ways depending on various efficiency trade-offs provided by the underlying data structure used and its associated algorithms. Firstly, we list the complexity measures that we use, all of which have average-case and worst-case variants:

$\mathsf{al}_P(n)$ : the number of blocks allocated by program $P$ as a function of input size $n$ .
$s_P(n)$ : space complexity; a measure of the amount of memory allocated by $P$ (the sum of the size of every allocated block).
$m_P(n)$ ; number of memory accesses (loads/stores) made by program $P$ .

We use $m_P(n)$ as a rough measure of the runtime complexity as it correlelates highly with the latter, denoted by $t_P(n)$ .

Data Structure	Safety Check Time	Space Overhead
Trie	$O(\log{s_P(n)})$	$O(\mathsf{al}_P(n))$
Shadow Memory / Hash Table	$O(1)$	$O(s_P(n))$
Binary Search Tree	$O(\log{\mathsf{al}_P(n)})$	$O(\mathsf{al}_P(n))$
Linked List	$O(\mathsf{al}_P(n))$	$O(\mathsf{al}_P(n))$

The best choice overall that balances memory consumption with low runtime for checks is via a binary search tree.