|
Barretenberg
The ZK-SNARK library at the core of Aztec
|
Warning: This document is intended to provide an overview of the ECCVM in barretenberg. It is not a complete specification and does not cover all edge cases or optimizations. The source code should be consulted for a complete understanding of the implementation.
The ECCVM efficiently proves the correct execution of accumulated elliptic curve operations on the BN-254 curve. It does this by witnessing the correct execution into a table of numbers (in the same field as the base field of the elliptic curve) and applying polynomial constraints, multiset equality-checks, and lookup arguments.
\(\newcommand{\fq}{\mathbb{F}_q}\) \(\newcommand{\fr}{\mathbb{F}_r}\) \(\newcommand{\zr}{\mathbb{Z}/r\mathbb{Z}}\) \(\newcommand{\zq}{\mathbb{Z}/q\mathbb{Z}}\) \(\newcommand{\NeutralElt}{\mathcal{O}}\)
We have the following facts:
In general, \(\zq\) and \(\zr\) refer to the additive abelian groups; we use \(\fq\) and \(\fr\) when we require the multiplicative structure. We do not strictly abide by this convention (common in cryptography), but it helps disambiguate usage.
We also use the following constants:
Finally, the terminology pc stands for point-counter. (In particular, it does not stand for "program counter".)
In a nutshell, the ECCVM is a simple virtual machine to facilitate the verification of native elliptic curve computations. Given an op_queue of BN-254 operations, the ECCVM compiles the execution of these operations into an execution trace representation over \(\fq\) (the field of definition / base field of BN-254). This field is also the scalar field of Grumpkin.
In a bit more detail, the ECCVM is a compiler that takes a sequence of operations (in BN-254) and produces a table of numbers (in \(\fq\)), such that the correct evaluation of the sequence of operations precisely corresponds to polynomial constraints vanishing on the rows of this table. Moreover, these polynomial constraints are independent of the specific sequence of operations. As our tables of numbers have elements in \(\fq\), the native field of the circuit is \(\fq\). When we prove these constraints, we use the Grumpkin curve \(C\).
The core complication comes from the efficient handling of scalar multiplications. Due to MSM optimizations, we effectively produce three tables, where each table has its own set of multivariate polynomials, such that correct evaluation corresponds to those polynomials vanishing row-wise. These tables "communicate" via strict lookup arguments and multiset-equality checks.
Earlier documentation exists. While it does not exactly match the current codebase, it is a helpful guide; this document is an updated explication.
We first specify the allowable operations; the OpQueue is roughly a list of operations on a fixed elliptic curve, including a running accumulator which propagates from instruction to instruction. It may be seen as a finite state machine processing simple elliptic curve operations with a single memory register.
At any moment we have an accumulated value \(A\), and the potential operations are: add, mul, eq, reset, eq_and_reset. There are four selectors \(q_{\text{add}}, q_{\text{mul}}, q_{\text{eq}}, q_{\text{reset}}\), so all operations except eq_and_reset correspond to a unique selector being on. Given an operation, we have an associated opcode value:
EccOpCode | Op Code Value |
|---|---|
add | \(\texttt{1000} \equiv 8\) |
mul | \(\texttt{0100} \equiv 4\) |
eq_and_reset | \(\texttt{0011} \equiv 3\) |
eq | \(\texttt{0010} \equiv 2\) |
reset | \(\texttt{0001} \equiv 1\) |
On the level of selectors,
\[ \texttt{opcode_value}=8\,q_{\text{add}} + 4\,q_{\text{mul}} + 2\,q_{\text{eq}} + q_{\text{reset}}. \]
add takes a point \(P\) and updates \(A \leftarrow A + P\).mul takes \(P\) and \(s \in \fr\) and updates \(A \leftarrow A + sP\).eq takes \(P\) and "checks" \(A == P\).reset sets \(A \leftarrow \NeutralElt\).eq_and_reset takes \(P\), checks \(A == P\), and then sets \(A \leftarrow \NeutralElt\).Decomposing scalars is an important optimization for (multi)scalar multiplications, especially when many scalars are 128-bit.
Both \(\fr\) and \(\fq\) have primitive cube roots of unity (their orders are \(\equiv 1 \pmod{3}\)). Fix \(\beta \in \fq\) a primitive cube root of unity. It induces an order-6 automorphism \(\varphi\) of BN-254:
\[ \varphi: (x,y) \mapsto (\beta x, -y). \]
As \(E(\fq) \cong \zr\), and the natural map \(\fr \rightarrow \mathrm{End}_{\mathrm{ab.gp}}(\zr)\) is an isomorphism, \(\varphi\) corresponds to \(\zeta \in \fr\) satisfying
\[ \zeta^2 - \zeta + 1 = 0. \]
In particular, \(\lambda := -\zeta\) is a cube root of unity in \(\fr\) and satisfies \(\lambda^2 + \lambda + 1 = 0\).
Given \(s \in \zr\), we can write \(s = z_1 - \lambda z_2 = z_1 + \zeta z_2\) with "small" \(z_i\). Consider the lattice \(L := \ker\big( \mathbb{Z}^2 \to \zr\big)\), \((a,b)\mapsto a + \zeta b\). A fundamental domain around the origin lies inside a box with side length \(B := \frac{\sqrt{3r}}{2} < 2^{128}\), hence \(z_i\) fit in 128 bits. See split_into_endomorphism_scalars method in the field module for details.
An operation in the OpQueue may be entered into a table as follows:
| op | X | Y | z_1 | z_2 | mul_scalar_full |
Here, op is the value of the operation, \((X, Y)\) are the affine coordinates of \(P\), mul_scalar_full stands for "full scalar if the operation is `mul`" (so is an element of \(\fr\)), and z_1 and z_2 are a decomposition of mul_scalar_full as explained above. In particular, z_1 and z_2 may each be represented by 128 bits.
The column representation is naturally equivalent to the representation as a VM operation.
From the perspective of the ECCVM, the ECCOpQueue just contains a list of ECCVMOperations, i.e., it is just an Input Trace. It is worth noting that the ECCOpQueue class indeed contains more moving parts, to link together the ECCVM with the rest of the Goblin protocol.
As explained, the ECCOpQueue corresponds to a one-register finite state machine whose primitives are a set of operations on our elliptic curve.
From this perspective, the goal of the ECCVM is to compile the execution of this state machine. The ECCVM takes in an ECCOpQueue, which corresponds to the execution of a list of operations in BN-254, and constructs three tables, together with a collection of multivariate polynomials for each table, along with some lookups and multiset constraints. (The number of variables of a polynomial associated with a table is precisely the number of columns of that table.) Then the key claim is that if (1) the polynomials associated to each table vanish on every row, (2) the lookups are satisfied, and some multi-set equivalences hold (which mediate between tables), then the tables corresponds to the correct execution of the ECCOpQueue, i.e., to the correct execution of the one-register elliptic curve state machine.
Breaking abstraction, the reason we choose this model of witnessing the computation is that it is straightforward to SNARK.
In trying to build the execution trace of ECCOpQueue, the mul opcode is the only one that is non-trivial to evaluate, especially efficiently. One straightforward way to encode the mul operation is to break up the scalar into its bit representation and use a double-and-add procedure. We opt for the Straus MSM algorithm with \(w=4\), which requires more precomputing but is significantly more efficient.
Before we dive into the Straus algorithm, here is the high-level organization. We go "row by row" in the ECCOpQueue; if the instruction is not a mul, the Transcript table handles it. If it is a mul operation, it is automatically part of an MSM (potentially one of length 1), and we defer evaluation to the Straus mechanism (which involves two separate tables: an MSM table and a Precomputed table). Eventually, at the end of an MSM (i.e., if an op is a mul and the next op is not), the Transcript Columns will pick up the claimed evaluation from the MSM tables and continue along their merry way.
To do this in a moderately efficient manner is involved; we include logic for skipping computations when we can. For instance, if we have a mul operation with the base point \(P=\NeutralElt\), then we will have a column that bears witness to this fact and skip the explicit scalar multiplication. Analogously, if the scalar is 0 in a mul operation, we also encode skipping the explicit scalar multiplication. This does not merely allow us to save work; it dramatically simplifies the actual MSM computations (especially recursively), by throwing out circumstances when there can be case logic. However, this, together with the delegation of work to multiple tables, itself required by the Straus algorithm, nonetheless results in somewhat complicated column structure.
However, at least some of this complexity is forced on us; in Barretenberg, we represent the \(\NeutralElt\) of an elliptic curve in Weierstrass form as \((0, 0)\) for efficiency. (Note that \(\NeutralElt\) is always chosen to be the point-at-infinity and in particular it has no "affine representation". Note further that \((0, 0)\) is indeed not a point on our elliptic curve!) These issues are worth keeping in mind when examining the ECCVM.
Recall, our high-level goal is to compute
\[\sum_{i=0}^{m-1} s_i P_i,\]
where \(s_i\in \fr\) and \(P_i\) are points on BN-254, i.e., we want to evaluate a multi-scalar multiplication of length \(m\). We set \(w=4\), as this is our main use-case. (In the code, this is represented as static constexpr size_t NUM_WNAF_DIGIT_BITS = 4;.) We have seen about that, setting \(P'_i:=\varphi(P_i) = \lambda P_i\), we may write \(s_iP_i = z_{i, 1}P_i - z_{i, 2}P'_i\), where \(z_{i,j}\) has no more than 128 bits. We therefore assume that our scalars have no greater than 128 bits.
The first thing to specify is our windowed non-adjacent form (wNAF). This is an optimization for computing scalar multiplication. Moreover, the fact that we are working with an elliptic curve in Weierstrass form effectively halves the number of precomputes we need to perform.
Warning: our implementation is not what is usually called wNAF. To avoid confusion, we simply avoid discussion on traditional (w)NAF.
Here is the key mathematical claim: for a 128-bit positive number \(s\), we can uniquely write:
\[s = \sum_{j=0}^{31} a_j 2^{4j} + \text{skew},\]
where
In our implementation, we force \(a_{31}>0\) to guarantee that \(s\) is positive. Note that the exponent in the range of the digits \(a_j\) is determined by \(w=\texttt{NUM_WNAF_DIGIT_BITS} = 4\). The existence of the skew bit is to ensure that we can represent even numbers.
The above decomposition is referred to in the code as the wNAF representation. Each \(a_i\) is referred to either as a wNAF slice or digit.
We will come shortly to the algorithm, but as for the motivation: in our implementation, the neutral point of the group (i.e., point-at-infinity) poses some technical challenges for us. We work with the affine representation of elliptic curve points, and \(\NeutralElt\) certainly has no natural affine-coordiante representation. We choose to internally represent it as \((0, 0)\) (not a point on our curve!) and handle it with separate logic. It is therefore advantageous to avoid having to extraneously perform operations involving \(\NeutralElt\), especially when we are implementing the recursive ECCVM verifier.
Here is the problem: efficiently compute
\[\sum_i s_i P_i,\]
where the \(s_i\) are 128-bit numbers and \(P_i\) are points in BN-254. (Recall that we reduce to the case of 128-bit scalars by decomposing, as explained above.)
To do this, we break up our computation into steps.
For each \(s_i\), we expand it in wNAF form: \(s_i = \sum_{j=0}^{31} a_{i, j} 2^{4j} + \text{skew}_i\).
For every \(P_i\), precompute and store the multiples:
\[\{-15P_i, -13P_i, \ldots, 13P_i, 15P_i\}\]
as well as \(2P_i\). Note that, \(E\) is represented in Weierstrass form, \(nP\) and \(-nP\) have the same affine \(y\)-coordinate and the \(x\)-coordinates differ by a sign.
Here are the static variables we need.
NUM_WNAF_DIGITS_PER_SCALAR=32.NUM_WNAF_DIGIT_BITS = 4.ADDITIONS_PER_ROW = 4. This says that we can do 4 primitive EC additions per "row" of the virtual machine.We picture this algorithm as follows. We build a table, the \(i^{\text{th}}\) row of which is the wNAF expansion of \(s_i\) in most-significant to least-significant order. This means that the first column corresponds to the most significant digit ( \(a_{-, 31}\)).
We work column by column (this is the \(j\)-loop); for every vertical chunk of 4 elements, we accumulate (i.e., add to an accumulator \(A\)) looked up values corresponding to the digit/base-point pair. In the pseudo-code, we have an index \(31-j\) because we want to proceed in decreasing order of significant digits. (Looking forward, a "row" of the MSM table in the ECCVM can handle 4 such additions.) We do this until we exhaust the column. We then multiply the accumulator by \(16\) (as long as we are not at the last digit) and go to the next column. Finally, at the end we handle the skew digit.
We have three tables that mediate the computation. As explained above, all of the computations are easy except for scalar multiplications. We process the computation and chunk what looks like scalar multiplications into MSMs. Here is the brief outline.
transcript_builder. The transcript columns organize and process all of the computations except for the scalar multiplications. In particular, the Transcript Columns do not bear witness to the intermediate computations necessary for MSMs. However, they still "access" the results of these computations.precomputed_tables_builder. The precomputed columns are: for every \(P\) that occurs in an MSM (which was syntactically pulled out by the transcript_builder), we compute/store \(\{P, 3P, \ldots, 15P, 2P\}\).msm_builder actually computes/constrains the MSMs via the Straus algorithm.A final note: apart from three Lagrange columns, all columns are either 1. part of the input trace; or 2. witness columns committed to by the Prover.
In the following tables, each column has a defined "value range". If the range is not \(\fq\), the column must be range constrained, either with an explicit range check or implicitly through range constraints placed on other columns that define relations over the target column.
\(\newcommand{\transcriptmsminfinity}{{\mathrm{transcript\_msm\_infinity}}}\) \(\newcommand{\transcriptaccumulatornotempty}{{\mathrm{transcript\_accumulator\_not\_empty}}}\) \(\newcommand{\transcriptadd}{{\mathrm{transcript\_add}}}\) \(\newcommand{\transcriptmul}{{\mathrm{transcript\_mul}}}\) \(\newcommand{\transcripteq}{{\mathrm{transcript\_eq}}}\) \(\newcommand{\transcriptresetaccumulator}{{\mathrm{transcript\_reset\_accumulator}}}\) \(\newcommand{\transcriptmsmtransition}{{\mathrm{transcript\_msm\_transition}}}\) \(\newcommand{\transcriptpc}{{\mathrm{transcript\_pc}}}\) \(\newcommand{\transcriptmsmcount}{{\mathrm{transcript\_msm\_count}}}\) \(\newcommand{\transcriptmsmcountzeroattransition}{{\mathrm{transcript\_msm\_count\_zero\_at\_transition}}}\) \(\newcommand{\transcriptpx}{{\mathrm{transcript\_Px}}}\) \(\newcommand{\transcriptpy}{{\mathrm{transcript\_Py}}}\) \(\newcommand{\transcriptbaseinfinity}{{\mathrm{transcript\_base\_infinity}}}\) \(\newcommand{\transcriptzone}{{\mathrm{transcript\_z1}}}\) \(\newcommand{\transcriptztwo}{{\mathrm{transcript\_z2}}}\) \(\newcommand{\transcriptzonezero}{{\mathrm{transcript\_z1zero}}}\) \(\newcommand{\transcriptztwozero}{{\mathrm{transcript\_z2zero}}}\) \(\newcommand{\transcriptop}{{\mathrm{transcript\_op}}}\) \(\newcommand{\transcriptaccumulatorx}{{\mathrm{transcript\_accumulator\_x}}}\) \(\newcommand{\transcriptaccumulatory}{{\mathrm{transcript\_accumulator\_y}}}\) \(\newcommand{\transcriptmsmx}{{\mathrm{transcript\_msm\_x}}}\) \(\newcommand{\transcriptmsmy}{{\mathrm{transcript\_msm\_y}}}\) \(\newcommand{\transcriptmsmintermediatex}{{\mathrm{transcript\_msm\_intermediate\_x}}}\) \(\newcommand{\transcriptmsmintermediatey}{{\mathrm{transcript\_msm\_intermediate\_y}}}\) \(\newcommand{\transcriptaddxequal}{{\mathrm{transcript\_add\_x\_equal}}}\) \(\newcommand{\transcriptaddyequal}{{\mathrm{transcript\_add\_y\_equal}}}\) \(\newcommand{\transcriptbasexinverse}{{\mathrm{transcript\_base\_x\_inverse}}}\) \(\newcommand{\transcriptbaseyinverse}{{\mathrm{transcript\_base\_y\_inverse}}}\) \(\newcommand{\transcriptaddlambda}{{\mathrm{transcript\_add\_lambda}}}\) \(\newcommand{\transcriptmsmxinverse}{{\mathrm{transcript\_msm\_x\_inverse}}}\) \(\newcommand{\transcriptmsmcountattransitioninverse}{{\mathrm{transcript\_msm\_count\_at\_transition\_inverse}}}\)
| column name | builder name | value range | computation | description |
|---|---|---|---|---|
| Populated in the first loop | ||||
| \(\transcriptmsminfinity\) | transcript_msm_infinity | \(\{0, 1 \}\) | msm_output.is_point_at_infinity(); | are we at the end of an MSM and is the output the point at infinity? |
| \(\transcriptaccumulatornotempty\) | accumulator_not_empty | \(\{0, 1 \}\) | row.accumulator_not_empty = !state.is_accumulator_empty;, final_row.accumulator_not_empty = !updated_state.is_accumulator_empty; | not(is the accumulator either empty or point-at-infinity?) |
| \(\transcriptadd\) | q_add | \(\{0, 1 \}\) | is opcode | |
| \(\transcriptmul\) | q_mul | \(\{0, 1 \}\) | is opcode | |
| \(\transcripteq\) | q_eq | \(\{0, 1\}\) | is opcode | |
| \(\transcriptresetaccumulator\) | q_reset_accumulator | \(\{0, 1 \}\) | does opcode reset accumulator? | |
| \(\transcriptmsmtransition\) | msm_transition | \(\{0, 1\}\) | msm_transition = is_mul && next_not_msm && (state.count + num_muls > 0); | are we at the end of an msm? i.e., is current transcript row the final mul opcode of a MSM |
| \(\transcriptpc\) | pc | \(\fq\) | updated_state.pc = state.pc - num_muls; | decreasing point counter. Only takes into count mul operations, not add operations. |
| \(\transcriptmsmcount\) | msm_count | \(\fq\) | updated_state.count = current_ongoing_msm ? state.count + num_muls : 0; | Number of muls so far in the current MSM (NOT INCLUDING the current step) |
| \(\transcriptmsmcountzeroattransition\) | msm_count_zero_at_transition | \(\{0, 1\}\) | ((state.count + num_muls) == 0) && entry.op_code.mul && next_not_msm; | is the number of scalar muls we have completed at the end of our "MSM block" zero? (note that from the definition, if this variable is non-zero, then msm_transition == 0.) |
| \(\transcriptpx\) | base_x | \(\fq\) | (input trace) \(x\)-coordinate of base point \(P\) | |
| \(\transcriptpy\) | base_y | \(\fq\) | (input trace) \(y\)-coordinate of base point \(P\) | |
| \(\transcriptbaseinfinity\) | base_infinity | \(\{0, 1\}\) | is \(P=\NeutralElt\)? | |
| \(\transcriptzone\) | z1 | \([0,2^{128})\) | (input trace) first part of decomposed scalar | |
| \(\transcriptztwo\) | z2 | \([0,2^{128})\) | (input trace) second part of decomposed scalar | |
| \(\transcriptzonezero\) | z1_zero | \(\{0, 1\}\) | is z1 zero? | |
| \(\transcriptztwozero\) | z2_zero | \(\{0, 1\}\) | is z2 zero? | |
| \(\transcriptop\) | op_code | \(\in \fq\) | entry.op_code.value(); | 8 q_add + 4 q_mul + 2 q_eq + q_reset |
| Populated after converting from projective to affine coordinates | ||||
| \(\transcriptaccumulatorx\) | accumulator_x | \(\fq\) | x-coordinate of accumulator \(A\) | |
| \(\transcriptaccumulatory\) | accumulator_y | \(\fq\) | y-coordinate of accumulator \(A\) | |
| \(\transcriptmsmx\) | msm_output_x | \(\fq\) | if we are at the end of an MSM, (output of MSM) + offset_generator() = (msm_output_x, msm_output_y), else 0 | |
| \(\transcriptmsmy\) | msm_output_y | \(\fq\) | if we are at the end of an MSM, (output of MSM) + offset_generator() = (msm_output_x, msm_output_y), else 0 | |
| \(\transcriptmsmintermediatex\) | transcript_msm_intermediate_x | \(\fq\) | if we are at the end of an MSM, (output of MSM) = (transcript_msm_intermediate_x, transcript_msm_intermediate_y), else 0 | |
| \(\transcriptmsmintermediatey\) | transcript_msm_intermediate_y | \(\fq\) | if we are at the end of an MSM, (output of MSM) = (transcript_msm_intermediate_x, transcript_msm_intermediate_y), else 0 | |
| \(\transcriptaddxequal\) | transcript_add_x_equal | \(\{0, 1\}\) | (vm_x == accumulator_x) or (vm_infinity && accumulator_infinity); | do the accumulator and the point we are adding have the same \(x\)-value? (here, the two point we are adding is either part of an add instruction or the output of an MSM). 0 if we are not accumulating anything. |
| \(\transcriptaddyequal\) | transcript_add_y_equal | \(\{0, 1\}\) | (vm_y == accumulator_y) or (vm_infinity && accumulator_infinity); | do the accumulator and the point we are adding have the same \(y\)-value? 0 if we are not accumulating anything. |
| \(\transcriptbasexinverse\) | base_x_inverse | \(\fq\) | if adding a point to the accumulator and the \(x\) values are not equal, the inverse of the difference of the \(x\) values. (witnesses transcript_add_x_equal == 0 | |
| \(\transcriptbaseyinverse\) | base_y_inverse | \(\fq\) | if adding a point to the accumulator and the \(y\) values are not equal, the inverse of the difference of the \(y\) values. (witnesses transcript_add_y_equal == 0 | |
| \(\transcriptaddlambda\) | transcript_add_lambda | \(\fq\) | if adding a point into the accumulator, contains the lambda gradient: the slope of the line between \(A\) and \(P\) | |
| \(\transcriptmsmxinverse\) | transcript_msm_x_inverse | \(\fq\) | used to validate transcript_msm_infinity correct; if the former is zero, this is the inverse of the \(x\) coordinate of the (non-shifted) output of the MSM | |
| \(\transcriptmsmcountattransitioninverse\) | msm_count_at_transition_inverse | \(\fq\) | used to validate transcript_msm_count_zero_at_transition |
In the above table, we have a reference what the transcript columns are. Here, we provide a natural-language summary of witness-generation, which in turn directly implies what the constraints should look like. Some of the apparent complexity comes from the fact that, for efficiency, we do operations in projective coordinates and then normalize them all at the end. (This requires fewer field-inversions.)
We start our top row with \(\transcriptmsmcount = 0\) and \(\transcriptaccumulatornotempty = 0\). This corresponds to saying "there are no active multiplications in our MSM" and "the accumulator is \f$\NeutralElt\f$".
We process each op.
If the op is an add, we process the addition as follows. We have an accumulated value \(A\) and a point \(P\) to add. If \(\transcriptbaseinfinity = 1\), we don't need to do anything: \(P=\NeutralElt\). Similarly, if \(\transcriptaccumulatornotempty = 0\), then we just (potentially) need to change \(\transcriptaccumulatornotempty\), \(\transcriptaccumulatorx\) and \(\transcriptaccumulatory\). Otherwise, we need to check \(\transcriptaddxequal\): the formula for point addition requires dividing by \(\Delta x\), and in particular is not well-constrained either when adding points that are negative of each other or adding the same point to itself. (These two cases may be easily distinguished by examining \(\transcriptaddyequal\)). If we are not in this case, we need the help of of \(\transcriptaddlambda\), which is the slope between the points \(P\) and \(A\). (This slope will happily not be \(\infty\), as we have ruled out the only occasions it had to be.)
The value \(A\leftarrow A + P\) will of course involve different \(\transcriptaccumulatorx\) and \(\transcriptaccumulatory\), but may also cause \(\transcriptaccumulatornotempty\) to flip.
We emphasize: we do not modify \(\transcriptpc\) in this case. Indeed, that variable is only modified based on the number of small scalar muls we are doing.
If the op is eq, we process the op as follows. We have an accumulated value \(A\) and a point \(P\). Due to our non-uniform representation of \(\NeutralElt\), we must break up into cases.
If our op is eq_reset, we do the same as for eq, but we also set \(\transcriptaccumulatornotempty\leftarrow 0\).
If our op is a mul, with scalars z1 and z2, the situation is more complicated. Now we have to update auxiliary wires. As explained, every mul operation is understood to be part of an MSM.
mul op. The value of this column at the next row increments by \(2 - \transcriptzonezero - \transcriptztwozero\).muls) for efficiency reasons, as it allows for cheaper commitments.op is not a mul, and the total number of active mul operations (which is \(\transcriptmsmcount + (2 - \transcriptzonezero - \transcriptztwozero)\)) is non-zero, set the \(\transcriptmsmtransition = 1\). Else, set \(\transcriptmsmcountzeroattransition = 1\). Either way, the current mul then represents the end of an MSM. This is where \(\transcriptmsmcountattransitioninverse\) is used.OFFSET.The size of the non-zero part of the table is the length of the OpQueue + 1 (we have shiftable columns). We have organized our wire values so that zero-padding is compatible with the polynomial constraints. (See e.g. the decreasing point counter.)
As the set of precomputed columns is small, we include the code snippet.
As discussed in Decomposing Scalars, WLOG our scalars have 128 bits and we may expand them in \(w=4\) wNAF:
\[s = \sum_{j=0}^{31} a_j 2^{4j} + \text{skew},\]
where
Given a wNAF digit \(\in \{-15, -13, \ldots, 15\}\), we \(\text{compress}\) it via the map:
\[\text{compress}\colon d\mapsto \frac{d+15}{2},\]
which is of course a bijection \(\{-15, -13, \ldots, 15\}\rightarrow \{0,\ldots, 15\}\). (This compression is helpful for indexing later: looking forward, the values \([-15P, -13P, \ldots, 15P]\) will be stored in an array, so if we want to looking up \(kP\), where \(k\in \{-15, -13, \ldots, 15\}\), we can go to the \(\text{compress}(k)\) index of our array associated to \(P\).)
\(\newcommand{\precomputesonehi}{{\mathrm{precompute\_s1hi}}}\) \(\newcommand{\precomputesonelo}{{\mathrm{precompute\_s1lo}}}\) \(\newcommand{\precomputestwohi}{{\mathrm{precompute\_s2hi}}}\) \(\newcommand{\precomputestwolo}{{\mathrm{precompute\_s2lo}}}\) \(\newcommand{\precomputesthreehi}{{\mathrm{precompute\_s3hi}}}\) \(\newcommand{\precomputesthreelo}{{\mathrm{precompute\_s3lo}}}\) \(\newcommand{\precomputesfourhi}{{\mathrm{precompute\_s4hi}}}\) \(\newcommand{\precomputesfourlo}{{\mathrm{precompute\_s4lo}}}\) \(\newcommand{\precomputeskew}{{\mathrm{precompute\_skew}}}\) \(\newcommand{\precomputepointtransition}{{\mathrm{precompute\_point\_transition}}}\) \(\newcommand{\precomputepc}{{\mathrm{precompute\_pc}}}\) \(\newcommand{\precomputeround}{{\mathrm{precompute\_round}}}\) \(\newcommand{\precomputescalarsum}{{\mathrm{precompute\_scalar\_sum}}}\) \(\newcommand{\precomputetx}{{\mathrm{precompute\_tx}}}\) \(\newcommand{\precomputety}{{\mathrm{precompute\_ty}}}\) \(\newcommand{\precomputedx}{{\mathrm{precompute\_dx}}}\) \(\newcommand{\precomputedy}{{\mathrm{precompute\_dy}}}\) \(\newcommand{\preselect}{{\mathrm{precompute\_select}}}\)
The following is one row in the Precomputed table; there are NUM_WNAF_DIGITS_PER_SCALAR / WNAF_DIGITS_PER_ROW == 32/4 = 8 rows. The row index is i. (This number is is also witnessed as round.)
| column name | builder name | value range | computation | description |
|---|---|---|---|---|
| \(\precomputesonehi\) | s1 | \([0, 4)\) | first two bits of \(\text{compress}(a_{31 - 4i})\) | |
| \(\precomputesonelo\) | s2 | \([0, 4)\) | second two bits of \(\text{compress}(a_{31 - 4i})\) | |
| \(\precomputestwohi\) | s3 | \([0, 4)\) | first two bits of \(\text{compress}(a_{31 - (4i + 1)})\) | |
| \(\precomputestwolo\) | s4 | \([0, 4)\) | second two bits of \(\text{compress}(a_{31 - (4i + 1)})\) | |
| \(\precomputesthreehi\) | s5 | \([0, 4)\) | first two bits of \(\text{compress}(a_{31 - (4i + 2)})\) | |
| \(\precomputesthreelo\) | s6 | \([0, 4)\) | second two bits of \(\text{compress}(a_{31 - (4i + 2)})\) | |
| \(\precomputesfourhi\) | s7 | \([0, 4)\) | first two bits of \(\text{compress}(a_{31 - (4i + 3)})\) | |
| \(\precomputesfourlo\) | s8 | \([0, 4)\) | second two bits of \(\text{compress}(a_{31 - (4i + 3)})\) | |
| \(\precomputeskew\) | skew | \(\{0,1\}\) | skew bit | |
| \(\precomputepointtransition\) | point_transition | \(\{0,1\}\) | are we at the last row corresponding to this scalar? | |
| \(\precomputepc\) | pc | \(\fq\) | value of the point counter of this EC operation | |
| \(\precomputeround\) | round | \(\fq\) | "row" of the computation, i.e., i. | |
| \(\precomputescalarsum\) | scalar_sum | \(\fq\) | sum up-to-now of the digits | |
| \(\precomputetx\), \(\precomputety\) | precompute_accumulator | \(E(\fq)\subset \fq\times \fq\) | \((15-2i)P\) | |
| \(\precomputedx\), \(\precomputedy\) | precompute_double | \(E(\fq)\subset \fq\times \fq\) | \(2P\) | |
| \(\preselect\) | \(\{0,1\}\) | if 1, evaluate Straus precomputation algorithm at current row |
First, let us recall the structure of ScalarMul.
Note that, with respect to the decomposition in wnaf, wnaf_digits[i]= \(a_{31-i}\). Indeed, the order of the array wnaf_digits is from highest-order to lowest-order.
Given a ScalarMul, it is easy to construct the 8 rows of the Precomputed Table. As explained, WNAF_DIGITS_PER_ROW = 4; hence the NUM_WNAF_DIGITS_PER_SCALAR = 32 digits in may be broken up into 8 rows, where each row corresponds to 4 wNAF digits, each of which is in \(\{-15, -13, \ldots, 13, 15\}\).
wnaf_digits[4i], wnaf_digits[4i+1], wnaf_digits[4i+2], and wnaf_digits[4i+3], compress from \(\{-15, -13, \ldots, 13, 15\}\rightarrow \{0,\ldots 15\}\) via the monotonic map \(z\mapsto \frac{z+15}{2}\). Then our compressed digits are in the latter range.i == 7) for the current scalar, else 0. This tracks if the next row in the table corresponds to a new ScalarMul.ScalarMul.pc.i.wnaf_digits[0], ..., wnaf_digits[4i+3]; \(\precomputescalarsum\) is precisely the value of the \(4i\)-digit number corresponding to this string of digits.ScalarMul.)The constraints are straightforward.
scalar is \(\precomputescalarsum - \precomputeskew\).\[(\textbf{shift}(\precomputetx), \textbf{shift}(\precomputety)) = (\precomputedx, \precomputedy) + (\precomputetx, \precomputety),\]
where the latter addition of course happens on \(E\).ScalarMul), so if the next block is to be well-formed, the next round element better be \(0\). Note that this is compatible with zero-padding.For every non-trivial short scalar mul, we fill in \(8\) non-trivial rows to the precomputed table. Here, non-trivial means: \(P\neq \NeutralElt\) and \(z\neq 0\), where \(z\) is the short (128-bit) scalar we are multiplying by. This means that for \(m\) (non-trivial) short scalar mul operations, we add \(8m\) rows to the precomputed table.
This table is the most algorithmically involved.
\(\newcommand{\msmpc}{{\mathrm{msm\_pc}}}\) \(\newcommand{\msmsizeofmsm}{{\mathrm{msm\_size\_of\_msm}}}\) \(\newcommand{\msmcount}{{\mathrm{msm\_count}}}\) \(\newcommand{\msmround}{{\mathrm{msm\_round}}}\) \(\newcommand{\msmtransition}{{\mathrm{msm\_transition}}}\) \(\newcommand{\msmadd}{{\mathrm{msm\_add}}}\) \(\newcommand{\msmdouble}{{\mathrm{msm\_double}}}\) \(\newcommand{\msmskew}{{\mathrm{msm\_skew}}}\) \(\newcommand{\msmxone}{{\mathrm{msm\_x1}}}\) \(\newcommand{\msmyone}{{\mathrm{msm\_y1}}}\) \(\newcommand{\msmxtwo}{{\mathrm{msm\_x2}}}\) \(\newcommand{\msmytwo}{{\mathrm{msm\_y2}}}\) \(\newcommand{\msmxthree}{{\mathrm{msm\_x3}}}\) \(\newcommand{\msmythree}{{\mathrm{msm\_y3}}}\) \(\newcommand{\msmxfour}{{\mathrm{msm\_x4}}}\) \(\newcommand{\msmyfour}{{\mathrm{msm\_y4}}}\) \(\newcommand{\msmaddone}{{\mathrm{msm\_add1}}}\) \(\newcommand{\msmaddtwo}{{\mathrm{msm\_add2}}}\) \(\newcommand{\msmaddthree}{{\mathrm{msm\_add3}}}\) \(\newcommand{\msmaddfour}{{\mathrm{msm\_add4}}}\) \(\newcommand{\msmsliceone}{{\mathrm{msm\_slice1}}}\) \(\newcommand{\msmslicetwo}{{\mathrm{msm\_slice2}}}\) \(\newcommand{\msmslicethree}{{\mathrm{msm\_slice3}}}\) \(\newcommand{\msmslicefour}{{\mathrm{msm\_slice4}}}\) \(\newcommand{\msmlambdaone}{{\mathrm{msm\_lambda1}}}\) \(\newcommand{\msmlambdatwo}{{\mathrm{msm\_lambda2}}}\) \(\newcommand{\msmlambdathree}{{\mathrm{msm\_lambda3}}}\) \(\newcommand{\msmlambdafour}{{\mathrm{msm\_lambda4}}}\) \(\newcommand{\msmcollisionxone}{{\mathrm{msm\_collision\_x1}}}\) \(\newcommand{\msmcollisionxtwo}{{\mathrm{msm\_collision\_x2}}}\) \(\newcommand{\msmcollisionxthree}{{\mathrm{msm\_collision\_x3}}}\) \(\newcommand{\msmcollisionxfour}{{\mathrm{msm\_collision\_x4}}}\) \(\newcommand{\msmaccumulatorx}{{\mathrm{msm\_accumulator\_x}}}\) \(\newcommand{\msmaccumulatory}{{\mathrm{msm\_accumulator\_y}}}\)
| column name | builder name | value range | computation | description |
|---|---|---|---|---|
| \(\msmpc\) | pc | \(\fq\) | counter over all half-length (128 bit) scalar muls used to compute the required MSMs. constant on a given MSM, refers more precisely to the number of completed scalar muls up until the current MSM. in particular, this skips values, unlike transcript_pc. | |
| \(\msmsizeofmsm\) | msm_size | \(\fq\) | the number of points that will be scaled and summed | |
| \(\msmcount\) | msm_count | \(\fq\) | row.msm_count = static_cast<uint32_t>(offset); | number of wNAF-multiplications processed so far in this round. |
| \(\msmround\) | msm_round | \([0, 32]\) | current "round" of MSM, in \(\{0, \ldots, 32\}\), which corresponds to the wNAF digit being processed. (final round deals with the skew bit.) | |
| \(\msmtransition\) | msm_transition | \(\{0, 1\}\) | (digit_idx == 0) && (row_idx == 0) | is 1 if the current row starts the processing of a different MSM, else 0 . Note this not the same as the description of transcript_msm_transition |
| \(\msmadd\) | q_add | \(\{0, 1\}\) | 1 if we are adding points in the Straus MSM algorithm at current row | |
| \(\msmdouble\) | q_double | \(\{0, 1\}\) | 1 if we are doubling accumulator in the Straus MSM algorithm at current row | |
| \(\msmskew\) | q_skew | \(\{0, 1\}\) | 1 if we are incorporating skew points in the Straus MSM algorithm at current row | |
| \(\msmxone\) | add_state[0].point.x | \(\fq\) | \(x\)-coordinate of first potential point (corresponding to add_state[0]) to add in Straus MSM round | |
| \(\msmyone\) | add_state[0].point.y | \(\fq\) | \(y\)-coordinate of first potential point (corresponding to add_state[0]) to add in Straus MSM round | |
| \(\msmxtwo\) | add_state[1].point.x | \(\fq\) | \(x\)-coordinate of second potential point (corresponding to add_state[1]) to add in Straus MSM | |
| \(\msmytwo\) | add_state[1].point.y | \(\fq\) | \(y\)-coordinate of second potential point (corresponding to add_state[1]) to add in Straus MSM | |
| \(\msmxthree\) | add_state[2].point.x | \(\fq\) | x-coordinate of third potential point (corresponding to add_state[2]) to add in Straus MSM round | |
| \(\msmythree\) | add_state[2].point.y | \(\fq\) | y-coordinate of third potential point (corresponding to add_state[2]) to add in Straus MSM round | |
| \(\msmxfour\) | add_state[3].point.x | \(\fq\) | x-coordinate of fourth potential point (corresponding to add_state[3]) to add in Straus MSM round | |
| \(\msmyfour\) | add_state[3].point.y | \(\fq\) | y-coordinate of fourth potential point (corresponding to add_state[3]) to add in Straus MSM round | |
| \(\msmaddone\) | add_state[0].add | \(\{0, 1\}\) | are we adding msm_x1/msm_y1 (resp. add_state[0]) into accumulator at current round? | |
| \(\msmaddtwo\) | add_state[1].add | \(\{0, 1\}\) | are we adding msm_x2/msm_y2 (resp. add_state[1]) into accumulator at current round? | |
| \(\msmaddthree\) | add_state[2].add | \(\{0, 1\}\) | are we adding msm_x3/msm_y3 (resp. add_state[2]) into accumulator at current round? | |
| \(\msmaddfour\) | add_state[3].add | \(\{0, 1\}\) | are we adding msm_x4/msm_y4 (resp. add_state[3]) into accumulator at current round? | |
| \(\msmsliceone\) | add_state[0].slice | \([0, 15]\) | wNAF slice value (a.k.a. digit) for first point (corresponds to odd number in \(\{-15, -13, \ldots, 15\}\) via the monotonic bijection) | |
| \(\msmslicetwo\) | add_state[1].slice | \([0, 15]\) | wNAF slice value (a.k.a. digit) for second point | |
| \(\msmslicethree\) | add_state[2].slice | \([0, 15]\) | wNAF slice value (a.k.a. digit) for third point | |
| \(\msmslicefour\) | add_state[3].slice | \([0, 15]\) | wNAF slice value (a.k.a. digit) for fourth point | |
| \(\msmlambdaone\) | add_state[0].lambda | \(\fq\) | if add_state[0].add==1 (eqiv. if msm_add1 == 1), slope of the line between the two points being added. else 0. | |
| \(\msmlambdatwo\) | add_state[1].lambda | \(\fq\) | if add_state[1].add==1 (eqiv. if msm_add2 == 1), slope of the line between the two points being added. else 0. | |
| \(\msmlambdathree\) | add_state[2].lambda | \(\fq\) | if add_state[2].add==1 (eqiv. if msm_add3 == 1), slope of the line between the two points being added. else 0. | |
| \(\msmlambdafour\) | add_state[3].lambda | \(\fq\) | if add_state[3].add==1 (eqiv. if msm_add3 == 1), slope of the line between the two points being added. else 0. | |
| \(\msmcollisionxone\) | add_state[0].collision_inverse | \(\fq\) | if add_state[0].add == 1, the difference of the \(x\) values of the accumulator and the point being added. used to ensure incomplete ecc addition exceptions not triggered if msm_add1 = 1 | |
| \(\msmcollisionxtwo\) | add_state[1].collision_inverse | \(\fq\) | if add_state[1].add == 1, the difference of the \(x\) values of the accumulator and the point being added. | |
| \(\msmcollisionxthree\) | add_state[2].collision_inverse | \(\fq\) | if add_state[2].add == 1, the difference of the \(x\) values of the accumulator and the point being added. | |
| \(\msmcollisionxfour\) | add_state[3].collision_inverse | \(\fq\) | if add_state[3].add == 1, the difference of the \(x\) values of the accumulator and the point being added. | |
| \(\msmaccumulatorx\) | accumulator_x | \(\fq\) | (accumulator_x, accumulator_y) = \(A\) is the accumulated point |
| \(\msmaccumulatory\) | accumulator_y | \(\fq\) |
We have already given a high-level summary of the Straus algorithm. Let us get into the weeds!
The function signature is the following:
In other words, compute_rows takes in a vector of MSMs (each of which is a vector of ScalarMuls), together with the total number of non-zero mul operations we compute and the (easy-to-compute) a priori size bound num_msm_rows, and returns a vector of MSMRows and two vectors, which will represent our point-counts (i.e., will be fed into the lookup argument).
Before we get into the content, note that we may assume that no point is \(\NeutralElt\) in any of the MSMs. Indeed, this is due to checks done by the Transcript Columns. However, it is in principle possible that some of the scalars are \(0\); we do not force \(\transcriptzonezero = 0 \Rightarrow \transcriptzone != 0\)
Each row (after the first row) in the MSM table will belong to one of the MSMs we are assigned to compute in msms. For an msm of size m, the number of rows that will be added in the MSM table is:
\[(\texttt{NUM-WNAF-DIGITS-PER-SCALAR + 1})\lceil \frac{m}{\texttt{ADDITIONS-PER-ROW}}\rceil + (\texttt{NUM-WNAF-DIGITS-PER-SCALAR} - 1) = 33\frac{m}{4} + 31.\]
There is one other quirk we should explicate before entering the algorithm. In general, the logic for affine elliptic curve addition can have cases: when the \(x\) coordinates match up. (Doubling cannot have cases for points on our affine elliptic curve because there is no \(\fq\)-rational \(2\)-torsion.) Moreover, in general our logic must branch if either our base or the accumulator is \(\NeutralElt\). As we have indicated several times above, for optimization, we represent \(\NeutralElt\) as \((0, 0)\) in the code. It is advantageous to avoid this branching logic. We do so by relaxing completeness. In particular, we start off the accumulator of every MSM with a fixed offset_generator. This is a fixed point of \(E\) that we may consider pseudo-random (though it is fixed and indeed hardcoded). Then we decree that for our MSM to be valid, in the course of the Straus algorithm, whenever I accumulate \(A\leftarrow A + P\), the \(x\)-coordinates of \(A\) and \(P\) differ. This condition of being valid may be witnessed by the prover providing the inverse of the difference of the \(x\)-coordinates every time it is necessary.
This indeed breaks completeness, inasmuch as there are valid EccOpQueues which will not be able to be compiled into a valid execution trace. However, this is vanishingly unlikely, in the course of any normal operations.
Finally, we may describe the algorithm. We implicitly organize our data in the following type of table (as indicated in the Straus Section). Each row of our table corresponds to a scalar multiplication: the elements of the row are the wNAF digits (including the skew bit). In other words, the columns of our table correspond to wNAF digits. Our algorithm will proceed column by column, from most significant to least significant digit, processing one vertical chunk of four elements after another. To emphasize: this table syntactically encoding our MSM is not what we refer to as the MSM table of the VM, which rather witnesses the correct execution of the MSM.
point_table_read_counts[0] and point_table_read_counts[1] to track the positive and negative lookups corresponding to \(nP\), where \(n\in \{-15, -13, \ldots, 13, 15\}\). Each table will have size total_number_of_muls * 8 (since POINT_TABLE_SIZE/2 = 8).point_table_read_counts based on msm[point_idx].wnaf_digits[digit_idx]. Update read counts based on skew as well.We deviate from the witness generation algorithm here. In the code, in order to minimize the number of field divisions, we first compute in projective coordinates, then batch-normalize back to affine to fill in the values affine values. Here we just specify the values of the various columns in a more naive way.
msm to be offset_generator. (This allows us to avoid case-logic in EC addition.)digit-position (a.k.a. column of my syntactic MSM table) in \(0..31\):msm. If so, set \(\msmtransition = 1\).ADDITIONS_PER_ROW points per row:offset_generator, for a non-first row of an MSM this involves processing the previous row of the MSM table.AddState):Suppose we have an MSM of short scalars of size \(m\). Then the number of rows we add to the MSM table of the VM is:
\[(\texttt{NUM-WNAF-DIGITS-PER-SCALAR + 1})\lceil \frac{m}{\texttt{ADDITIONS-PER-ROW}}\rceil + (\texttt{NUM-WNAF-DIGITS-PER-SCALAR} - 1) = 33\frac{m}{4} + 31.\]
Indeed, there are \(\lceil \frac{m}{\texttt{ADDITIONS-PER-ROW}}\rceil\) add-rows per digit, and there are \(\texttt{NUM-WNAF-DIGITS-PER-SCALAR + 1}\) digits per scalar (where the last digit is the skew digit). Finally, the last term comes from the doublings.
Note that in the regime where we have a few long MSMs, this is asymptotic to \(8.25m\), which is comparable to the \(8m\) we get from the precomputed columns. On the other hand, if we have many very short MSMs, the size of this table dominates what was produced by the precomputed columns.
As explained in the introduction, we sometimes treat these three sets of disjoint columns as three separate tables. There must be a mechanism to ensure that they "communicate" with each other. We do not use bare copy-constraints; instead, we use three multisets equality checks. (These were formerly called "strict lookup arguments", where every write had to have precisely one corresponding read.) The goal of these section is to sketch how these constraints, together with the lookups, fully piece together the ECCVM. We emphasize that this is merely a sketch; for full details, please see the set relation.
The basic structure: each term corresponds to two multisets. (One could refer to these as an input multiset and an output multiset, but this directionality is purely psychological and we avoid it.) One table contributes to one of the multisets, another table constributes to the other multiset, and the term is satisfied if the two multisets are equal.
This facilitates communication between the Precomputed table and the MSM table. Recall that pc stands for point-counter. round refers to the wNAF digit-place being processed and wnaf_slice is the compressed digit (i.e., it is a way of representing the actual wNAF digit). Recall that the skew digit's place corresponds to round == 32. This multiset check ensures that at every round, for a given point, the wNAF digit computed by the Precomputed table is actually being used by the MSM table.
This facilitates communication between the Precomputed table (and more specifically, the PointTable) and the Transcript table. More precisely, it ensures that the Precomputed table has done the wNAF decomposition correctly for the scalar corresponding to the point at position pc.
This facilitates communication between the MSM table and the Transcript table. More precisely, this links the output of the MSM (that is performed by the MSM table) to what is written in the Transcript table. We also ensure that msm-size is correctly inputed into the MSM table.
Unlike the multisets (a.k.a. "strict lookup arguments"), the lookups here are more conventional. For every non-trivial point \(P\), there is a lookup table (computed by the Precomputed table) that contains (pc, compressed_slice, (2 * (compressed_slice) - 15)[P]), where compressed_slice is in the range {0, ..., 15}. The MSM table will look up the relevant value as it goes through the Straus algorithm. For full details, please see lookup relation.