Probabilistic Correctness and Pointer Tagging
Last Updated:Here we attempt to give an account of memory safety with probabilistic correctness. In fact by relaxing the correctness conditions of a screener to guarantee memory safety with all but some small error probability, the doors are opened to a plethora of opportunities to boost performance and shrink the memory overhead. We then examine a case-in-point in the form of a screener based on pointer tagging. While pointer tagging generally requires hardware support for a certain addressing feature - a feature not natively supported by HRAM0 - we describe a suitable adaptation.
Families of Programs with Probabilistic Correctness
Many of the definitions pertaining to program families in our article on classification of programs can also be generalized to appropriately permit probabilistic correctness. For the following definition to make formal sense, we require that there is some nondeterminism in the programs. We capture this by extending the program input to incorporate random coins of length $\chi(\kappa)$
where $\kappa$
is the "problem size". The "data" portion of the input is then of length $n(\kappa) - \chi(\kappa)$
. We denote the randomness distribution by $\mathcal{R}$
and write $\mathsf{exec}(P, x; r)$
to signify the execution of $\mathsf{exec}(P, x \parallel r)$
where $r \stackrel{\$}{\gets} \mathcal{R}$
is first sampled from the randomness distribution and incorporated as part of the program input.
$\mathcal{F}$
with probabilistic correctness $q \in [0, 1]$
is a subset of admissible programs $\mathcal{F} \subseteq \mathbb{P}$
equipped with a pair $(\mathcal{X}, \mathcal{Y})$
, an input distribution $\mathcal{X}$
and output distribution $\mathcal{Y}$
respectively, such that for all programs $P \in \mathcal{F}$
, it holds that for an input $x \stackrel{\$}{\gets} \mathcal{X}$
sampled from the input distribution and randomness $r \stackrel{\$}{\gets} \mathcal{R}$
, $P$
produces the expected output for input $x$
with probability $q$
i.e. $\mathsf{Pr}\big[p_\mathcal{Y}(x, \mathsf{exec}(P, x; r)) \to 1\big] \geq q$
where the probability is taken over choice of $x$
and $r$
.
$p$
-Strenghtening Transformations
$\mathcal{X}$
be an input distribution on $\mathbb{Z}^\ast$
, and let $p \gg 1/2 \in [0, 1]$
. A transformation $T$
is said to be p-strengthening if for all $P \in \mathbb{P}$
the following two properties are satisfied.$\mathbf{S.1}$
:$\mathsf{Pr} \big[\mathsf{exec}(T(P), \mathbf{x}; r) \neq \bot \;\;|\;\; \mathsf{exec}(P, \mathbf{x}; r) = \bot \big] = p\;\; \forall \mathbf{x} \in \mathcal{X}$
where the probability is taken over the choice of randomness$r \stackrel{\$}{\gets} \mathcal{R}$
$\mathbf{S.2}$
:$T(P) \preceq P$
(non-weakening)
Observe that in S.1 the probability is $p$
for every input in the input space. This is a stronger assurance than the probabilistic correctness of Definition~1 where the probability ranges over $\mathcal{X}$
. This guarantees that in Definition~2 safety is equiprobable for highly unlikely inputs as for highly likely inputs.
$p$
-screener is a transformation $T : \mathbb{P} \to \mathbb{P}$
that is- an instrumentor
$p$
-strengthening
We now explore a topic that readily gives an instantiation of a $p$
-screener.
Pointer Tagging
We borrow some notation and terminology from this arXiv preprint. Moreover, the following description is based on Section 3 ("Memory Tagging") in the aforementioned preprint (p. 3).
- Every block of memory of length
$\mathsf{TG}$
(tagging granularity) bytes aligned on a$\mathsf{TG}$
-byte boundary is associated with a tag of$\mathsf{TS}$
(tag size) bits. Such a block is called a granule. - The upper
$\mathsf{TS}$
bits of every pointer contains a tag. - The tags are stored in shadow memory. A program is instrumented to prefix every load and store with a probabilistic safety check that compares the tag in shadow memory with the tag contained in the pointer.
- A tag is chosen during memory allocation, thereby associating the memory chunk being allocated with this tag, setting the upper
$\mathsf{TS}$
bits to this tag and returning the resulting pointer. - Every load and store triggers an exception on mismatch between the pointer tag and tag stored in shadow memory.
- Temporal and spatial errors are detected with a probability of approximiately
$(2^{\mathsf{TS}} - 1)/2^{\mathsf{TS}}$
.
The above pertains to a real-world implementation in the Hardware ASAN (Address Sanitizer) tool for the AArch64 (ARM-64) architecture which leverages a hardware feature in AArch64 called Top-Byte Ignore (TBI) that when enabled ignores the top byte in an address when accessing memory thereby allowing this byte to be used to store a tag.
Applying the above to HRAM0 is not immediately straightforward since HRAM0 is a an abstract model that is independent of units of storage and representation of integers. One of the concessions we are forced to make is to impose an upper bound on the memory usage of supported programs; that is, those programs that are deemed instrumentable. Recall the set $\tilde{\mathbb{P}}$
, the set of instrumentation-friendly programs, from the previous article. We must further restrict this set as follows. A supported program accesses a virtual heap of bounded size $\mathsf{VS}$
(virtual size). Define the set $$\stackrel{\circ}{\mathbb{P}} := \{P \in \tilde{\mathbb{P}} : s_P(n) \leq \mathsf{VS} \; \forall n \in \mathbb{N}\}$$
where $s_P(n)$
is the worst-case space complexity of program $P$
with input size $n$
. The simplification presented here does not allow $\mathsf{VS}$
to grow as a function of input size $n$
but this can be viewed as a natural extension.
This overall approach to pointer tagging can be encapsulated as a probabilistic implementation of the Address Manager construct from the previous article. We now give a specific high-level description of this Address Manager implementation. Note that the following is an approximate account that ignores some specific details of HRAM0 memory management.
Recall that an an instance of Address Manager is a tuple of algorithms $(\mathsf{Init}, \mathsf{Malloc}, \mathsf{Free}, \mathsf{Access}, \mathsf{Finalize})$
.
$\mathsf{Init}$
: Returns instrumented initialization code that runs- malloc
$\mathsf{VS} / \mathsf{TG}$
words of shadow memory ->$\mathsf{SB}$
(shadow (memory) base) - set
$\mathsf{VB} \gets \mathsf{SB} + (\mathsf{VS} / \mathsf{TG})$
(virtual (memory) base)
- malloc
$\mathsf{Malloc}$
: Returns instrumented code for malloc subroutine (with arguments$p$
(pointer) and$s$
(size)) that performs the following steps:- generate random integer tag in the range
$0$
-$(2^{\mathsf{TS}} - 1)$
- compute
$z \gets (p - \mathsf{VB}) / \mathsf{TG}$
- for every
$0 \leq j < s / \mathsf{TG}$
-
- store tag at
$\mathsf{SB} + z + j$
- store tag at
- return combine(tag, p)
- generate random integer tag in the range
$\mathsf{Free}$
: Returns instrumented code for free subroutine (with argument$p$
(pointer)) that performs the following steps:- extract tag from
$p$
and remove tag from$p$
- compute
$z \gets (p - \mathsf{VB}) / \mathsf{TG}$
- while
$(\mathsf{SB} + z)$
= tag -
- store 0 at
$\mathsf{SB} + z$
- store 0 at
-
$z \gets z + 1$
- extract tag from
$\mathsf{Access}$
: Returns instrumented code for check subroutine (with argument$p$
(pointer)) that performs the following steps:- extract tag from
$p$
and remove tag from$p$
- compute
$z \gets (p - \mathsf{VB}) / \mathsf{TG}$
- If
$(\mathsf{SB} + z)$
= tag -
- Return 0
- Else
-
- Return -1
- extract tag from
$\mathsf{Finalize}$
: Returns instrumented code for finalize subroutine that:- frees shadow memory
Since our measures of space complexity as described in the previous article count the number of "words" allocated and since the size of each "word" in HRAM0 is unbounded, it follows that $\mathsf{TS}$
may be set arbitrarily large without increasing memory cost. The time complexity of safety checks does however grow at least linearly with $\mathsf{TS}$
(because HRAM0 does not have bitwise operations nor multiplication/division). Asymtotically, the above address manager has the same safety check time and space overhead as the non-probabilistic shadow memory / hash table based approach listed in the table in the previous article. In practice, the space overhead of the above probabilistic address manager is a factor of $\mathsf{TG}$
smaller than the space overhead of the former.