Add support for parallelization in the tensor-commitment #263
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A small experiment for SIS
Also include a small-experiment, which lives in an isolated test file. The experiment underlines an idea to speed-up the SIS hashes down to 300ns per field element. It is however incomplete in the sense that it does not implement everything. Only the parts that are relevant "performance-wise" have been implemented.
The experiment is deeply tied to the choices of parameters that we use for SIS. (Recall d = 2 and \beta = 8). We do several thing. Instead, of representing "slots" in the hashing key with 2 field elements (internally represented as 4 uint64 each), we represent them using 6 uint64 limbs as follows.
Thus, each uint64 in the key are prefixed with 20 bits. Multiplying these key elements, by small number reduces this margin to 17 bits. Thus, we can perform all our field operations using uint64 arithmetic. This is much faster than field arithmetic even though there are "more" addition" to do. This approach benefits from not having to do reductions.
Another notable improvement stems from how we split large field elements into small chunks. The approach benefits from how gnark stores field elements internally (i.e. the fact that they are stored as [4]uint64) to minimize the number of CPU operations required to obtain a limb. This is however done at the cost of increasing the number of limbs from 88 to 85.
Parallelization for the tensor commitment
Append
method of the tensor commitment to possibly take several polynomials at once. This is done at the cost of adding a breaking change. The tensor commitment no longer takes a SIS hasher as a parameter but takes a constructor that returns "fresh" hashers. This was necessary for thread-safety.