Published: 14th November 2023 DOI: 10.4204/EPTCS.393 ISSN: 20752180 
Preface Alessandro Coglio and Sol Swords  
Extended Abstract:
Classical LU Decomposition in ACL2 Carl Kwan  1 
Extended Abstract:
CHERI Concentrate in ACL2 Carl Kwan, Yutong Xin and William D. Young  6 
Extended Abstract:
A BoundFinding Tool for Arithmetic Terms Sol Swords  11 
A Formalization of Finite Group Theory: Part II David M. Russinoff  16 
A Formalization of Finite Group Theory: Part III David M. Russinoff  33 
A Case Study in Analytic Protocol Analysis in ACL2 Max von Hippel, Panagiotis Manolios, Kenneth L. McMillan, Cristina NitaRotaru and Lenore Zuck  50 
Advances in ACL2 Proof Debugging Tools Matt Kaufmann and J Strother Moore  67 
Using Counterexample Generation and Theory Exploration to Suggest Missing Hypotheses Ruben Gamboa, Panagiotis Manolios, Eric Smith and Kyle Thompson  82 
Formal Verification of ZeroKnowledge Circuits Alessandro Coglio, Eric McCarthy and Eric W. Smith  94 
Verification of GossipSub in ACL2s Ankit Kumar, Max von Hippel, Panagiotis Manolios and Cristina NitaRotaru  113 
Proving Calculational Proofs Correct Andrew T. Walter, Ankit Kumar and Panagiotis Manolios  133 
ACL2 Proofs of Nonlinear Inequalities with Imandra Grant Passmore  151 
Verification of a Rust Implementation of Knuth's Dancing Links using ACL2 David S. Hardin  161 
This volume contains the proceedings of the Eighteenth International Workshop on the ACL2 Theorem Prover and Its Applications (ACL22023), a twoday workshop held at the University of Texas at Austin and online, on November 1314. ACL2 workshops occur at approximately 18month intervals, and they provide a major technical forum for users of the ACL2 theorem proving system to present research related to the ACL2 theorem prover and its applications.
ACL2 is an industrialstrength automated reasoning system, the latest in the BoyerMoore family of theorem provers. The 2005 ACM Software System Award was awarded to Boyer, Kaufmann, and Moore for their work in ACL2 and the other theorem provers in the BoyerMoore family.
The proceedings of ACL22023 include ten long papers and three extended abstracts. Each submission received three reviews. The workshop also included several "rump session" talks — short unpublished presentations that discussed ongoing research — as well as two invited talks from Jim Grundy of Amazon Web Services and Eric Smith of Kestrel Institute.
As program cochairs we are grateful for the other leaders of this workshop: the organizing chairs, Matt Kaufmann and Mayank Manjrekar; the arrangements chair, Rob Sumners; and the registration chair, David Rager. We appreciate the great community of researchers who continue to develop exciting projects and tools using ACL2, as well as contributing papers to the workshop. We also wish to thank the program committee for helping the editorial process proceed very smoothly.
This workshop would not have been possible without the support of a large number of people. We thank those who authored, submitted, and presented papers. We also wish to thank the Program Committee for their diligence in reviewing the papers in a timely manner and for further discussions after the reviews. We are very grateful to the invited speakers for agreeing to provide their unique and extensive perspectives to the workshop participants.
Alessandro Coglio and Sol Swords, program chairs
November 2023
We present an ACL2 formalization and verification of a classical LU decomposition algorithm. LU decomposition plays a crucial role in numerical computations, linear algebra algorithms, and applications that involve matrix operations or solving systems of linear equations. However, theorem proving with matrices can be challenging because typical implementations of matrix algorithms often involve the heavy use of indexing, among other issues. Our approach to formalizing LU decomposition in ACL2 adopts a methodical and constructive derivation process that is more amenable to formal verification. We also provide an ACL2 implementation for Gaussian elimination as one of several interesting consequences of our formalization.
The broad application of LU decomposition to linear algebra, and thus science as a whole, motivates our formalization in ACL2. We are interested in building reliable numerical linear algebra software for use in the design of critical systems. To our knowledge, this is the first formalization of an LU decomposition algorithm.
Linear algebra algorithms are used widely in scientific computing, data analysis, policy and decision making, the design of reliable infrastructure, etc. Their broad applications to critical systems make linear algebra methods a prime target for formal verification. However, computational linear algebra algorithms have seen few theorem prover formalizations. On one hand, these algorithms are often expressed in terms of operations on the indexed elements of a matrix. The use of indexing can obscure the design goals of the matrix algorithm, reduce human readability, and make reasoning for theorem provers difficult, among other drawbacks. On the other hand, many theorem provers tend to have limited support for the efficient computation of operations involving numeric values. This limits the utility of formalizing particularly computational linear algebra methods in such theorem provers.
Targeting the need for reliable but efficient linear algebra computation, we present a formalization of an LU decomposition algorithm and verify it in ACL2. LU decomposition, also known as LU factorization, is a fundamental linear algebra algorithm that decomposes a given matrix into upper and lower triangular factors. It is a crucial step in methods that aim to efficiently solve systems of linear equations, invert matrices, and compute matrix determinants.
As part of our formalization, we also verify the derivation of the LU decomposition algorithm. The algorithm we verify is sometimes known as "classical" or "rightlooking", and many mathematical texts introduce it as a sequence of Gauss transformation multiplications [1, 2, 8, 9]. Since we are using ACL2, we need to translate into recursion and induction. One typical derivation partitions a given matrix into a form that suggests recursing along the principal submatrices. However, algorithms following this derivation tend to remain heavily reliant on loops or indexing to express matrix operations. We take the more ACL2 natural approach.
An LU decomposition of a matrix $A$ are lower and upper triangular matrices $L$ and $U$, respectively, such that $A=LU$. We also make the requirement that $L$ is unit lower triangular, i.e. $L$ has $1$s on the diagonal. In the interest of memory optimization, this extra requirement makes it possible to overwrite the upper part of $A$ with the upper part of $U$ and the strictly lower part of $A$ with the strictly lower part of $L$ during the algorithm. Partition $A=LU$ as follows:
$ \begin{equation} \left( \begin{array}{cc} \alpha_{11} & a_{12}^T\\ \hline a_{21} & A_{22} \end{array} \right) = A = LU = \left( \begin{array}{cc} 1 & \\ \hline \ell_{21} & L_{22} \end{array} \right) \left( \begin{array}{cc} \upsilon_{11} & u_{12}^T \\ \hline & U_{22} \end{array} \right)\ . \label{eq:lu} \end{equation} $  (1) 
Regarding notation: lowercase Greek letters are field scalars; lowercase Latin letters are vectors; uppercase Latin letters are matrices; and assume that any posed variables are "conformal", e.g. if $A$ is $m\times n$, then $a_{21}$ is $(m1)\times 1$ and $a_{12}^T$ is $1\times (n1)$. Moving forward, we will drop the "bars" in partitions for simplicity. Instead of looping through the rows or columns of $A$, the partition suggests we recurse along the principal submatrices. We want (1) to hold after performing the algorithm, i.e.
$\begin{align*} \alpha_{11}=\upsilon_{11}\ , && a_{21} = \upsilon_{11}\ell_{21}\ , && a_{12}^T = u_{12}^T\ , && A_{22} = \ell_{21}u_{21}^T + L_{22} U_{22}\ . \end{align*} $ 
Since $A$ is given, $u_{12}^T$ and $\upsilon_{11}$ are obvious. Solving for the remaining components of $L$ and $U$ forces
$ \ell_{21} = a_{21} \alpha_{11}^{1}\ , \label{eq:basecase1} $  (2)  $ L_{22}U_{22} = A_{22}  a_{21}\alpha_{11}^{1}a_{12}^T\ . $  (3) 
This suggests an algorithm which requires merely updating $a_{21}$ and $A_{22}$. Indeed, consider Algorithm 1. Since the algorithm is recursive, we need to handle a few base cases. If the matrix is empty, then we just return an empty matrix. In the case that the matrix is a column vector, then Equation (2) indicates we should return $a_{21} \alpha_{11}^{1}$. Similarly, if we are given a row vector, then no updates are necessary. Otherwise, we can update the components of $A$ suggested by Equations (2) and (3) and proceed with the recursion.
The ACL2 implementation of Algorithm 1 is Program 1.
Descriptions of the ACL2 linear algebra functions used in this paper are
in Table 1. We use existing ACL2 primitive matrix operations and
functions in our formalization [3]. However, we also define both
new and alternative operations that are necessary in order to implement and
verify the LU decomposition algorithm. In cases where we define alternative
operations, we prove they are equivalent to those already in ACL2. We defer the
discussion of most of these new functions to the future, but those that appear
in this paper are out*
, getU
, and getL
. It is
also important to know that getL
fills the diagonal with $1$s.
Algorithm 1 Classical LU Decomposition procedure lu $(A\in\mathbb R^{m\times n})$ partition $A= \begin{pmatrix} \alpha_{11} & a_{12}^T\\ a_{21} & A_{22} \end{pmatrix} $ if $m=0$ or $n=0$ then return $\begin{pmatrix} \end{pmatrix}$ else if $n=1$ then return $\begin{pmatrix}\alpha_{11}\\ a_{21}\alpha_{11}^{1}\end{pmatrix}$ else if $m=1$ then return $A$ else $a_{21} := a_{21}\alpha_{11}^{1}$ $A_{22} := A_{22}  a_{21}a_{12}^T$ return $\begin{pmatrix}\alpha_{11} & a_{12}^T\\ a_{21} & \textrm{LU}(A_{22})\end{pmatrix}$ 
Program 1 ACL2 implementation of Algorithm 1(define lu ((A matrixp)) ... (b* (((unless (matrixp A)) (mempty)) ((if (memptyp A)) A) (alph (car (colcar A))) ((if (zerop alph)) (mzero (rowcount A) (colcount A))) ((if (memptyp (colcdr A))) (rowcons (list alph) (sm* (/ alph) (rowcdr A)))) ((if (memptyp (rowcdr A))) A) (a21 (colcar (rowcdr A))) (a12 (rowcar (colcdr A))) (A22 (colcdr (rowcdr A))) (a21 (sv* (/ alph) a21)) (A22 (m+ A22 (sm* 1 (out* a21 a12))))) (rowcons (rowcar A) (colcons a21 (lu A22))))) 
With the exception of some extra edge cases, Program 1 directly implements Algorithm 1. In order for the decomposition to succeed, we require only that $\alpha_{11}\neq0$ and $L_{22}U_{22} = A_{22}  a_{21}\alpha_{11}^{1}a_{12}^T$. The ACL2 theorem for this requirement is Program 2. This requirement is interesting because, given that $\alpha_{11}$ is nonsingular, it reduces the condition for a matrix to be LU decomposable into a condition about a smaller matrix, which is reminiscent of some "induction step". Indeed, the RHS of Equation (3) is the Schur complement of $\alpha_{11}$ in $A$ and is a key tool in statistics, probability, numerical analysis, and matrix analysis. While mathematical references commonly describe the conditions for the algorithm to succeed in terms of the leading principal submatrices, the proof that these conditions are sufficient reduces to an induction step that depends on Equation (3) [8].
In order to avoid writing an ACL2 statement about all the leading
principal submatrices of a matrix, we recurse along the Schur complements,
i.e. ensure $\alpha_{11}$ is nonzero and recurse on $S :=
A_{22}  a_{21}\alpha_{11}^{1}a_{12}^T$. This recognizes matrices
with nonsingular leading principal submatrices. Indeed, consider
Program 3. The function
nonsingularleadingprincipalsubmatricesp
only returns
nil
on a matrix if it encounters an $\alpha_{11}$ that is zero.
One way to think
about this condition is that if no zeros appear after $k$ recursive
steps, then the $k$th leading principal submatrix is nonsingular
because its determinant is nonzero. Instead of reasoning with
determinants, which are rarely useful in numerical
algorithms [9], Schur complements provide a more concise ACL2
condition for success and presents opportunites for further rich ACL2
explorations in numerical linear algebra.
To prove the correctness of the classical LU decomposition for a square matrix, we want to assume
that the matrix has nonsingular leading principal submatrices and induct along their Shur complements.
In ACL2, these are both satisfied by assuming nonsingularleadingprincipalsubmatricesp
from Program 3 in the hypothesis. Indeed, as seen in Program 4, including such a hypothesis enables
ACL2 to prove the desired theorem without any userprovided hints. ACL2 automatically inducts according
to a scheme suggested by nonsingularleadingprincipalsubmatricesp
, the induction step for which enables the use of lucorrectnessinductionstep
from Program 2.
Table 1 ACL2 linear algebra functions

Program 2 ACL2 LU "induction step" correctness(defthm lucorrectnessinductionstep (b* ((alph (car (colcar A))) (LU (lu A)) (L (getL LU)) (U (getU LU)) (a21 (colcar (rowcdr A))) (a12 (rowcar (colcdr A))) (A22 (colcdr (rowcdr A))) (a21 (sv* (/ alph) a21)) (A22 (m+ A22 (sm* 1 (out* a21 a12))))) (implies (and (matrixp A) (not (memptyp (rowcdr A))) (equal (colcount A) (rowcount A)) (not (zerop alph)) (equal (m* (getL (lu A22)) (getU (lu A22))) A22)) (equal (m* L U) A))) ...) 
(define nonsingularleadingprincipalsubmatricesp ((A matrixp)) :measure (and (rowcount A) (colcount A)) (b* (((unless (matrixp A)) nil) ((if (memptyp A)) t) (alph (car (colcar A))) ((if (zerop alph)) nil) ((if (or (memptyp (rowcdr A)) (memptyp (colcdr A)))) t) ;; Compute S = A22  out*(a21/alph,a12) (a21 (colcar (rowcdr A))) (a12 (rowcar (colcdr A))) (A22 (colcdr (rowcdr A))) (a21/a (sv* (/ alph) a21)) (S (m+ A22 (sm* 1 (out* a21/a a12))))) (nonsingularleadingprincipalsubmatricesp S)) ...)
(defthm lucorrectness (b* ((LU (lu A)) (L (getL LU)) (U (getU LU))) (implies (and (equal (colcount A) (rowcount A)) (nonsingularleadingprincipalsubmatricesp A)) (equal (m* L U) A))))
There are many applications to our formalization. One such application is that
we can now calculate the determinant of an LU decomposable matrix $A$ since
$\det(A)=\det(L)\det(U)=\det(U)$ is simply the product of the diagonal of $U$.
Another consequence of Algorithm 1 is that the computed $U$ is the row
echelon form of $A$. In particular, this means that
(defun ge (A) (getU (lu A)))
is the ACL2 formalization of Gaussian
elimination, which is one step away from an ACL2 formalization of
GaussJordan elimination. GaussJordan can be used to find matrix inverses
or even solve systems of linear equations such as $Ax=b$.
In practice, it tends to be better to use LU decomposition over forming an inverse. A linear system $Ax=b$ can be solved by forming $L$ and $U$, and then solving $Ly=b$ and $Ux = y$ via forwards and backwards substitution, respectively. This is faster than using GaussJordan to form an inverse [8]. We have formalized backwards and forwards substitution in ACL2, the discussion for which we defer to the future; with LU decomposition, they form an ACL2 method for solving systems of linear equations.
Previously, we formalized the CauchySchwarz inequality and other vectorrelated ideas from linear algebra and convex optimization [6, 5, 4, 7]. Here we show that our formalization of LU decomposition in ACL2 leads to a rich selection of research directions. We look forward to formalizing more methods for solving systems of linear equations. In addition, there are other decompositions (QR, rankreducing, spectral, etc.), optimization methods (leastsquares, gradient descent, etc.), and matrix analysis ideas (norms, tensors, etc.) to explore. These directions all serve as important methods in pure and applied mathematics, science, and engineering, and we expect to formalize verified and efficient computational linear algebra methods in the future.
We present an ACL2 formalization of CHERI Concentrate, a practical compression format for capabilities. Capabilities are unforgeable tokens of authority that grant and describe access to a region of memory; CHERI Concentrate enables the efficient storage and encoding / decoding of capabilities in reallife reliable hardware. As part of our formalization, we verify the correctness of the encoding and decoding functions, which specify the bounds in memory a program is permitted to access. These encode / decode functions are complicated and their correctness is not obvious to a human reader. This along with the use of capabilities in purportedly secure hardware is why formal verification is important. To our knowledge, this is the first executable formalization of CHERI Concentrate in ACL2.
We present a formalization of CHERIstyle capabilities in ACL2. CHERI (Capability Hardware Enhanced RISC Instructions) is a set of architectural features that extends conventional hardware InstructionSet Architectures (ISAs) with capabilities that enable finegrained memory protection and highly scalable software compartmentalization [7]. Capabilities are hardware descriptions for permissions that can be used to access data, code, and objects in a controlled and protected manner. Existing ISAs extended with CHERIstyle capabilities include 64bit MIPS, 32bit RISCV, 64bit RISCV, and 64bit Armv8A [6]. More recently, Arm has shipped a CHERIenabled Morello prototype processor, SoC, and board [1,2]. The work we present here is part of an ongoing effort to model CHERI capabilities in ACL2 for y86 and, eventually, x86 ISAs.
In CHERI systems, any accesses to memory are managed by capabilities. In particular, a capability will store in memory the upper and lower bounds of a region that a program is permitted to access. The capabilities that interest us are 128bit data structures that can be stored in memory or registers. However, in a 64bit system, this means that the bounds, which are two 64bit addresses, are encoded in a 128bit capability; this is in addition to other information that a capability must describe (e.g. permissions). Storing all pertinent information uncompressed can require 256 bits, which enacts a heavy toll on memory and quickly saturates a system's cache and datapaths [6].
CHERI Concentrate is a format that compresses capabilities to 128 bits on 64bit architectures and reduces L2 cache misses by 50% to 70% compared to 256bit capabilities [8]. This format compresses the bounds to 27 bits compared to 128 bits uncompressed. However, such a reduction in space requires a complicated encoding algorithm—sufficiently complicated that it is not clear from their definitions whether the encode / decode functions perform the desired tasks. In fact, initial iterations of the encoding algorithm were erroneous and required the use of HOL4, a higherorder logic proof assistant, to guide the development of the compression algorithm. The complexity of CHERI Concentrate warrants the use of formal methods, but HOL4 has limited support for execution.
In addition to formalizing CHERI capabilities, we verify the correctness of the CHERI Concentrate encoding and decoding algorithms in ACL2. By performing our formalization in ACL2, we can take advantage of ACL2's capacity for reasoning and execution efficiency. Moreover, given the size and complexity of the CHERI ISA in general, future CHERI models in ACL2 would rely on our formalization of CHERI Concentrate every time a capability is called. Indeed, we are currently building an ACL2 model of the y8664 ISA with CHERIstyle capabilities with the intent of verifying machine code programs involving capabilities. Since our y8664 model symbolically simulates the execution of machine code, our formalization of CHERI Concentrate needs to be executable in order to be useful. Using the ACL2 y8664 model extended with capabilities in CHERI Concentrate format, we can verify machine code programs involving capabilities, though this is beyond the scope of this paper and we intend on sharing these results at a later date. To our knowledge, this is the first executable formalization of CHERI Concentrate in ACL2.
Space limitations allow only a brief description here of 128bit CHERI capabilities and the CHERI Concentrate format. For details, an interested reader may refer to the CHERI ISA [6]. One method of motivating the introduction of CHERI capabilities is to consider them as fat pointers, i.e. a pointer extended with additional metadata such as bounds and permissions that may restrict the pointer's behaviour. In principle, a pointer contains an address that can be dereferenced in order to access a region of memory indicated by the address. However, in the interest of memory protection, the metadata associated with a fat pointer may store permissions to restrict how memory may be accessed and bounds to restrict the region of memory that may be accessed. CHERI design principles further restrict how capabilities may be constructed and updated, requiring additional metadata not typically found in traditional fat pointers.
We make a distinction between architectural capabilities and capabilities in memory. Architectural capabilities contain softwareaccessible fields that are not reflected explicitly by, but can still be inferred from, a given capability's inmemory representation. In our model, architectural capabilities contain the following fields: 1bit validity tag, 16bit permissions (perms), 18bit object type (otype), 64bit offset, 64bit base, and 64bit length. The base is the lower bound of the memory region dereferenceable by a capability. Adding length to base gives the upper bound (a.k.a. top) of the memory region dereferenceable by a capability. Adding offset to base gives the address when the capability is used as a pointer.
System constraints necessitate compressing the information provided by an architectural capability into the format described by CHERI Concentrate, which is then stored in memory. As part of our formalization, we develop and verify ACL2 functions that allow a user to easily convert between architectural and memoryresident capabilities. The proofs of these conversion functions amount to verifying that converting from architectural capabilities to memoryresident capabilities and then back to architectural capabilities recovers the information that was originally in the architectural capabilities, and viceversa.
A visual representation of memoryresident capabilities in CHERI Concentrate format can be seen in Figure 1. In the diagram, $a$ is a 64bit address, $p$ represents 16 bits for the permissions, $otype$ represents 18 bits for the object type, and 27 bits are used to encode bounds relative to the address. The compression uses a floatingpoint representation to encode the bounds. Regarding notation for the bounds: $B$ and $T$ are 14bit values that form the base and top of a capability's bounds relative to its address (the top two bits of $T$ are omited during encoding but deduced from $B$ during decoding); $I_E$ is an internal exponent bit that, when set, indicates the lower three bits of $B$ and $T$ ($B_E$ and $T_E$, respectively) represents an exponent at the expense of three bits of precision; $E$ is a 6bit exponent that determines the position at which $B$ and $T$ are inserted into $a$.
The bulk of our verification efforts is devoted to proving the encoding and decoding functions correct.
As part of the compression process, CHERI Concentrate specifies the algorithms by which the bounds are
encoded into $B$ and $T$, and how $B$ and $T$ can be decoded.
To illustrate the complexity of these algorithms, consider Listing 1, which is the ACL2 implementation
of the decoding algorithm decodecompression
. We omit the similarly complicated function
encodecompression
for brevity.
Figure 1: CHERI Concentrate format



Listing 1: Decoding function in ACL2
(define decodecompression ((c compressionp) (addr (nbp addr *xlen*))) ... (b* ((IE (nbash (+ *mw* *tw*) 1 c)) ((mv E TS BL Lcarry Lmsb) (if (= IE 0) (varssmallseg c) (varslargeseg c))) (BLH (+ (nbash *tw* ( *mw* *tw*) BL) Lcarry Lmsb)) (TLH (nbash 0 ( *mw* *tw*) BLH)) (TL (logior (ash TLH *tw*) TS)) ((mv CT CB) (corrections addr TL BL E)) (AT (ash addr ( (+ *mw* E)))) (top (nbash 0 (1+ *xlen*) (ash (logior (ash (+ AT CT) *mw*) TL) E))) (base (nbash 0 *xlen* (ash (logior (ash (+ AT CB) *mw*) BL) E))) (top (if (and (< E 51) (< 1 ( (nbash 63 2 top) (nbash 63 1 base)))) (nbash 0 *xlen* top) top))) (bounds top base)))
The function encodecompression
compresses a base $b_0$ and length $\ell_0$, that together form a top
$t_0=b_0+\ell_0$, into a format reflected by the portion of Figure 1 that pertains to $I_E$, $T$, and $B$.
The function decodecompression
reconstructs bounds relative to an address from the compression to obtain a
base $b_1$ and top $t_1$ that may have been subjected to some rounding.
Again, the interested reader can find motivating
details in the CHERI ISA and the original CHERI Concentrate
paper [6,8]. The properties of these
functions that we have verified are:
The last two properties state the conditions under which decoding a compression will recover the exact bounds that were originally encoded, which is important because compression may result in loss of precision. These last two properties contain stronger conclusions than the other properties and were not proven in the original CHERI Concentrate paper.
Our approach to proving these properties makes heavy use of the symbolic simulation framework GL with casesplitting [5,3]. While discharging these proofs by introducing traditional ACL2 rewrite rules would of course be feasible, we find that it takes ACL2 over 90 minutes to verify the returns specifier of a capability constructor without userprovided hints. This suggests that verification of the encode / decode properties would be better suited to more automatic methods. Moreover, since these are essentially bittwiddling proofs, bitblasting would be an amenable approach.
Listing 2: Theorem of encode / decode property 1 in ACL2
(defglparamthm decodeencodebboundlen>2^12 :hyp (and (validaddrp addr base len) (validblp base len) (<= (expt 2 *tw*) len)) :concl (<= (bounds>base (decodecompression (encodecompression len base) addr)) base) :parambindings `((((low 12) (high 16)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 16) (high 20)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 20) (high 24)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 24) (high 28)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 28) (high 32)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 32) (high 36)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 36) (high 40)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 40) (high 44)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 44) (high 48)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 48) (high 52)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 52) (high 56)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 56) (high 60)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 60) (high 64)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))) (((low 64) (high 65)) ,(gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))) :paramhyp (and (<= (expt 2 low) len) (< len (expt 2 high))) :covbindings (gl::autobindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
GL supports model checking with both BDDs as ACL2 symbolic objects and external SAT solvers. We use BDDs as the backend engine for GL. While case splits were necessary for some of the larger proofs, we found that GL BDDs were sufficient for our verification needs. As such, we did not utilize any external reasoning engine nor FGL, the successor to GL [4]. Moreover, GL's BDD proof procedure is verified in ACL2, alleviating any extra soundness concerns that may arise from calling external tools.
The ACL2 proofs of the encode / decode properties 56 are straightforward applications of GL's default
symbolic simulation event defglthm
.
In contrast, properties 14 all required parameterized case splitting. Listing 2 is the ACL2 theorem of the
encode / decode property 1. As indicated by the :parambindings
and :paramhyp
, we split the theorem
into 4bit "intervals", effectively
performing individual symbolic simulations for when the length $\ell_0$ satisfies $2^{12}\leq \ell_0<2^{16}$ and
$2^{16}\leq \ell_0<2^{20}$ and so on. Case splits for properties 24 are the same. We found that case splitting
into smaller bit "intervals" would discharge the proofs faster at the expense of verbose userprovided hints.
Case splitting being an effective approach suggests that the complexity of encoding and decoding does not scale with the
length of the memory regions. Indeed, this corroborates with the design principles of CHERI Concentrate: "encoding
efficiency, minimize delay of pointer arithmetic, and eliminate additional loadtouse delay" [8].
On the other hand, properties 56 required no case splitting largely because the encode / decode logic is greatly simplified
when $\ell_0<2^{12}$, and alignment requirements reduce the number of variable bits.
Arm is deploying CHERIenabled devices to industry security specialists at companies such as Google and Microsoft; they describe CHERI hardware capabilities as "a fundamentally more secure building block for software" [2]. It is critical that functions controlling memory accesses in these secure systems are correct; we have demonstrated that ACL2 is a excellent tool for modelling CHERI capabilities and verifying CHERI functions. While other formal methods have been part of the toolbox in developing CHERIextended ARM, RISC, and MIPS ISAs, the application of CHERI to the Intel x8664 ISA remains a sketch and has not yet been implemented. This presents an opportunity, which we intend to pursue, to perform ACL2 formal modelling and verification in the CHERIx8664 space.
Acknowledgements. This work was supported in part by SRI International and the Defense Advanced Research Projects Agency of the United States Department of Defense.
We describe a tool for establishing and proving upper and lower bounds for ACL2 arithmetic terms. The core algorithm used in this tool computes bounds of arithmetic terms using abstract interpretation. While this core algorithm is not new, we combine it with phased simplification using ACL2's rewriter, case splitting, and use of existing rules and user hints to provide bounds for subterms. Users may first expand and simplify a term as needed, split into cases if necessary, then find bounds for the simplified term using hypotheses, linear rules, type reasoning, and the core abstract interpretation alorithm. Finally, the same steps are retraced in order to prove the theorem that these bounds hold of the original term.
Verification of arithmetic algorithms sometimes involves proving upper and lower bounds for intermediate values, and in many cases it isn't clear what those bounds need to be. Generally we'd prefer to prove the narrowest bounds possible in order to have the best chance of succeeding in the rest of the proof. For example, an iterative approximation algorithm for square root might converge as long as each partial remainder is smaller than some threshold, but the verifier (and, indeed, the designer) may not know exactly what that threshold is. With some study, it may be possible to determine the bound that is needed to allow the rest of the proof to go through. But it is often easier and more pragmatic to prove the best bound you can with some limited level of effort and move on to see whether this bound is sufficient for the rest of the proof.
To support this pragmatic approach, we provide a tool
defbounds
that quickly determines and proves upper and lower
bounds for an ACL2 arithmetic term. The tool allows rewriting the
input term and optionally splitting into cases, then running a core
algorithm that attempts to find bounds for the resulting term by a
form of abstract interpretation very similar to that of Moore's Tau
bounders and associated Ainni tool [4]. The core bounding
algorithm is verified so that it can be used within an ACL2
metafunction. Initially, the defbounds
tool first emulates the given
sequence of rewriting and case splitting steps that the prover will
ultimately perform, then applies the bounding algorithm to find the
bounds on the resulting rewritten term. It then proves these bounds
by emitting a defthm
form with a series of hints replicating
the same rewriting and case splitting steps, followed by the
application of this metafunction to complete the proof.
It might be logical to ask why we don't use ACL2's builtin nonlinear
arithmetic procedure [1] to find bounds. This
idea is appealing because this nonlinear arithmetic procedure is quite
powerful, but unfortunately it has no builtin ability to find a bound
that it can prove. We could search for such a bound by trial and
error, but this is complicated by the fact that this nonlinear
procedure can be quite slow when several multiplications are
involved. We have encountered problems that nonlinear arithmetic fails
to solve after 20 minutes on a modern CPU, even when our lesscomplete
bounding algorithm can prove them. On the other hand,
defbounds
simply produces the best bounds that it can prove
with a linear pass over the input term; when needed, greater accuracy
can sometimes be attained by casesplitting to consider subranges of
values for certain subterms.
A simple example of the operation of defbounds follows. We define a
function foo
and try to derive bounds for it given bounds for its
input x
:
(includebook "centaur/misc/defbounds" :dir :system) (defund foo (x) ( (* x x) (* 3 x))) (defbounds foobounds (foo x) :hyp (and (rationalp x) (<= 2 x) (<= x 4)) :simphints ((:intheory (enable foo))))
The above event first runs the boundfinding routine and derives
bounds $[8, 10]$ for (foo x)
, then emits a defthm
event to prove them, producing a theorem called foobounds
.
These bounds aren't very precise (the actual range is $[2, 4]$),
because the two subterms (* x x)
and ( (* 3 x))
are
considered independently and their respective bounds are $[4, 16]$ and
$[12, 6]$. Somewhat better bounds can be obtained by instead
bounding the factorization (* x ( x 3))
, as follows:
(defthmd myfactor (equal (+ ( (* 3 x)) (* x x)) (* x ( x 3)))) ;; this rule would reverse the application of myfactor (local (intheory (disable distributivity))) (defbounds foobetterbounds (foo x) :hyp (and (rationalp x) (<= 2 x) (<= x 4)) :simphints ((:intheory (enable foo)) (:intheory (enable myfactor))))
This now instead has two simplifcation phases, first allowing
foo
to open and subsequently to apply
myfactor
. Here the bounds for the two factors are $[2, 4]$
and $[1, 1]$, so the computed bounds for the term are $[4, 4]$. A
further option for narrowing the computed range is discussed in
Section 4.
The core algorithm used by defbounds
is very similar to
Ainni [4] in that it recurs through the
arithmetic operators of a term, gets the bounds for the subterms, if
possible, and calls a specialized routine for each arithmetic operator
to determine the bounds of the result given the bounds of the
operands. A minor but useful refinement is that we add special
treatment for square forms: we provide an automatic lower bound of 0
for terms matching (* x x)
, and additionally treat terms
matching (* x x y)
as (* (* x x) y)
so that we get
this benefit for the squared portion.
On the other hand, we omit, for now, Ainni's support for
bitwise operators and mod
; we also treat all bounds as weak
bounds rather than distinguishing between less than and less than or
equal. Another difference is in the methods that we use to
determine bounds in the base case where the term is not an arithmetic operator.
Ainni is aimed at the particular problem of
simplifying reads of writes in a byteaddressed memory, so it only
supports calls of a function R (read bytes from a memory address) as
its base case, both by looking for typealist entries (assumptions)
that establish known bounds for such terms and by using basic facts
about the R function (i.e., a read of $n$ bytes is between 0 and
$2^{8n}1$). Our core algorithm uses linear rules, typeset reasoning,
and user suggestions (checked by backchaining) to find candidate
bounds for all subterms. We briefly describe each of these sources of
bounds.
First, we consider userprovided suggestions. The user interface for this allows suggestions of the following forms:
(< a 3) ;; upperbound the term A by 3 (>= (foo b) 5) ;; lowerbound the term (FOO B) by 5 (:free (c) (<= (bar c) 10)) ;; upperbound any call of BAR by 10
These are processed into a triple consisting of an LHS pattern
(which may have free and/or fixed variables), an RHS term, and a direction,
which together have the meaning ``for subterms matching LHS, try to
prove them (less/greater) than RHS.'' In this case ``matching'' means
that a term unifies with LHS. The RHS may then contain some or all of
the variables of LHS, which are then replaced with their matches in
the term. This can be used as a bound as long as the result of this
substitution is a ground term evaluating to a rational number
and we can verify its
correctness. We use the mfcrelievehyp
utility to check the
validity of the suggested bound; this tries to prove this inequality
by backchaining in ACL2's rewriter, as though it appeared in the
hypothesis of a rewrite rule. The result of mfcrelievehyp
can be trusted in the context of the the algorithm's proof of
correctness due to ACL2's metaextract feature
[3][2].
Second, we call ACL2's typeset reasoning function mfcts
to
obtain a typeset for the term. This typeset is an integer in which
each bit position signifies a certain type of object; an unset (0) bit
in a given position signifies that the term cannot be of the
corresponding type. Among these basic types corresponding to bits in
the typeset are negative integer, 0, 1, integer greater than 1,
negative ratio, and positive ratio. If the only possible types for the
term are among these, then we can sometimes determine bounds on that
basis; if e.g., the only bits set in our typeset are for 0 and
negativeratio, then this gives us an upper bound of 0. Again, the
metaextract feature allows us to assume that the typeset returned by
mfcts
is correct so that this determination can be trusted.
Third, we consider enabled linear arithmetic theorems where either
side of the inequality matches our term. For each such theorem, we
check that the other side of the inequality is a ground term that
reduces to a rational number after applying the unifying substitution,
that this number is a better bound than the current best one found,
and that the hypotheses of the theorem can be relieved by
backchaining, i.e. mfcrelievehyp
. Since metaextract
allows us to assume the correctness of existing linear lemmas and of
mfcrelievehyp
, this is sufficient to allow this bound to be
trusted once found.
Fourth, if the term is an application of any of the supported arithmetic operators, then we try to get bounds in the Ainni style, by recursively bounding the arguments and applying that operator's bounding routine to those inputs. The least upper bound and greatest lower bound produced by all four methods is then returned.
While the core boundfinding algorithm described above is powerful, by
itself it has a couple of deficiencies that the defbounds
tool addresses with extra features.
First, the bounding algorithm isn't very convenient to use by itself.
A very basic issue is that
while it will prove bounds of an arithmetic term, often we want to
state a theorem about a function that expands to an arithmetic term
instead of the term itself. Another issue is that a given arithmetic
term might not be in the best form for finding bounds. As a simple
example, suppose we want bounds for $ac + bc$ and while we have bounds
for $a$, $b$, and $c$, we have better bounds for $d = a + b$ than our
bounding algorithm can determine from the bounds for $a$ and $b$
separately. Then it is better to rewrite this expression to $dc$
before applying the boundfinding algorithm. The defbounds
tool supports this sort of need by allowing the user to provide a
series of ACL2 hint objects, each giving instructions for a phased
simplification step. For example, a defbounds
form might
include the following argument, describing two simplification phases
in sequence in which first myfunction
is expanded and then
$ac+bc$ is rewritten to $dc$:
:simphints ((:expand ((myfunction x y))) (:intheory (enable rewriteacplusbctodc)))
A second deficiency is due to the bounding algorithm's separate
consideration of correlated subterms. The example in Section
2 shows how this may make the computed bounds
imprecise; in that case the subterms (* x x)
and ( (* 3 x))
were considered separately with their ranges simply
summed, producing bounds $[8, 10]$ where the actual range is $[2,
4]$. We saw that expressing the same function in a different
formin this case factored form rather than distributedproduced
much better results, though still not perfectly precise. Another way to address this problem
is to make the bounds more accurate by separately considering
various ranges for the subexpressions. For example, if we split the
range of $x$ into two segments $[2, 3]$ and $[3, 4]$, we get much
better results on the original (distributed, not factored) formulation.
For $x \in [2, 3]$, the algorithm concludes that the
bounds for the two subexpressions $x^2$ and $3x$ are $[4, 9]$ and $[6,
9]$, giving $[5, 3]$ for the full expression; and for $x \in [3,
4]$, the bounds for the two subexpressions are $[9, 16]$ and $[9,
12]$, giving $[3, 7]$. Taking the union of those two ranges, this
gives bounds $[5, 7]$still somewhat imprecise but greatly
improved over the result without the case split. On the factored form, we get
$[3, 0]$ as the range when $x$ is in $[2, 3]$ and $[0, 4]$ when $x$ is in $[3, 4]$, so the bounds computed for the full domain are $[3, 4]$; again somewhat more precise than the bounds computed for the factored form without a casesplit.
Given the speed of the core algorithm, it can be practical to split
into many such cases if doing so increases the accuracy of the result.
For example, the above problem may be split into 128 subranges with
the defbounds
event still completing in a few seconds. For
the nonfactored form this results in computed bounds $[131/64,
259/64]$, which is only $65/64$ the size of the actual range, and
for the factored form the computed bounds are $[129/64, 4]$, even
slightly better.
This can be done simply by adding the following
argument to the defbounds
form:
:cases ((:rangesfromtoby x 2 4 1/64))
The defbounds
tool is in active use at Intel, especially in
a successful verification of an RTL implementation of floating point
square root. It is available under an MIT license from the public ACL2
books repository and its userlevel documentation is available at the
ACL2 documentation webpage [5].