Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Enhancement] Add Merkle Sum tree functionality (SMT Wrapper) #13

Merged
merged 51 commits into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
4f72b71
Add encoding functions for sum nodes
h5law Jun 6, 2023
2d7a905
Add get/update functions for SMST
h5law Jun 6, 2023
0e093e6
Add get/update/delete functionality to SMST
h5law Jun 6, 2023
9c08a4b
Fix bugs
h5law Jun 6, 2023
6d1ab23
Add fully featured proof system
h5law Jun 9, 2023
b6488d0
Add options to SMST and change option functions to recieve a treespec
h5law Jun 10, 2023
b82be70
Complete SMST integration with tests
h5law Jun 12, 2023
57a5632
Use 8 byte hex encoding for sum
h5law Jun 12, 2023
477a01f
Use hex to uint helper function
h5law Jun 12, 2023
a0638e8
Add SMST documentation
h5law Jun 12, 2023
147fac3
Comment fix in example
h5law Jun 12, 2023
43eb83c
Better rand usage in tests
h5law Jun 12, 2023
6383d20
remove hex dependency with binary puts
h5law Jun 14, 2023
86f84c5
Remove helper function for hex string conversion
h5law Jun 14, 2023
bf128b9
Update MerkleSumTree.md
h5law Jun 14, 2023
742739f
Update MerkleSumTree.md
h5law Jun 14, 2023
e2f1915
Update MerkleSumTree.md
h5law Jun 14, 2023
d4699ba
Update MerkleSumTree.md
h5law Jun 14, 2023
a63f1e3
Update MerkleSumTree.md
h5law Jun 14, 2023
e036dfe
Update MerkleSumTree.md
h5law Jun 14, 2023
f9cefb1
Update MerkleSumTree.md
h5law Jun 14, 2023
b4250c3
Update MerkleSumTree.md
h5law Jun 14, 2023
61556d5
Update MerkleSumTree.md
h5law Jun 14, 2023
527c735
Update MerkleSumTree.md
h5law Jun 14, 2023
b05d0e4
Update MerkleSumTree.md
h5law Jun 14, 2023
9cf2f50
Update MerkleSumTree.md
h5law Jun 14, 2023
6c4d1a3
Update MerkleSumTree.md
h5law Jun 14, 2023
56189ac
Update MerkleSumTree.md
h5law Jun 14, 2023
be6f3fd
Replace references to hex with binary encoding
h5law Jun 14, 2023
20019cf
Update toc
h5law Jun 14, 2023
430ad24
Seperate sum encoding
h5law Jun 14, 2023
0e5b9f0
s/sumLength/sumSize/ge
h5law Jun 14, 2023
636359d
Implement SMST with TreeSpec option
h5law Jun 14, 2023
eae0ade
Improve documentation
h5law Jun 15, 2023
3de7cba
Fix SMST options
h5law Jun 15, 2023
19862a0
Fix double sum addition to leaf nodes
h5law Jun 15, 2023
85f1af6
Add retrieval test cases for SMST
h5law Jun 15, 2023
c82cf45
Initial helpers
h5law Jun 20, 2023
0e793dc
Add hashSize helper
h5law Jun 20, 2023
a78b98b
Add digest leaf helper
h5law Jun 20, 2023
b5d43ab
Add hashNode helpers
h5law Jun 20, 2023
feee39e
Add serialize helper
h5law Jun 20, 2023
ba01891
add resolve helper
h5law Jun 20, 2023
3374c47
Fix diagram sum
h5law Jun 22, 2023
f3e760e
Address comments
h5law Jun 22, 2023
0c87d4b
Rename weight to sum
h5law Jun 23, 2023
c2674f5
Address comments
h5law Jun 25, 2023
2a87b70
Simplify delete returns
h5law Jun 25, 2023
31a148f
Add comments to test utils
h5law Jun 25, 2023
0f5f464
Pass proofs as pointers
h5law Jun 25, 2023
9a41773
Update docs with Nil value info
h5law Jun 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
246 changes: 246 additions & 0 deletions MerkleSumTree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# Sparse Merkle Sum Tree (smst) <!-- omit in toc -->

- [Overview](#overview)
- [Implementation](#implementation)
- [Sum Encoding](#sum-encoding)
- [Digests](#digests)
- [Visualisations](#visualisations)
- [General Tree Structure](#general-tree-structure)
- [Binary Sum Digests](#binary-sum-digests)
- [Sum](#sum)
- [Example](#example)

## Overview

Merkle Sum trees function very similarly to regular Merkle trees, with the primary difference being that each leaf node in a Merkle sum tree includes a `sum` in addition to its value. This allows for the entire tree's total sum to be calculated easily, as the sum of any branch is the sum of its children. Thus the sum of the root node is the sum of the entire tree. Like a normal Merkle tree, the Merkle sum tree allows for the efficient verification of its members, proving non-membership / membership of certain elements and generally functions the same.

Merkle sum trees can be very useful for blockchain applications in that they can easily track accounts balances and, thus, the total balance of all accounts. They can be very useful in proof of reserve systems whereby one needs to prove the membership of an element that is a component of the total sum, along with a verifiable total sum of all elements.

## Implementation

The implementation of the Sparse Merkle Sum Tree (SMST) follows, in principle, the same implementation as the [Plasma Core Merkle Sum tree][plasma core docs]. The main differences with the current SMT implementation are outlined below. The primary difference lies in the encoding of node data within the tree to accommodate for the sum.

In practice the SMST is a wrapper around the SMT with a new field added to the `TreeSpec`: `sumTree bool` this determines whether the SMT should follow its regular encoding of that of the sum tree.

_Note_: The Plasma Core Merkle Sum tree uses a 16 byte hex string to encode the sum whereas this SMST implementation uses an 8 byte binary representation of the `uint64` sum.

The majority of the code relating to the SMST can be found in:

- [smst.go](./smst.go) - main SMT wrapper functionality
- [hasher.go](./hasher.go) - SMST encoding functions
- [types.go](./types.go) - SMST interfaces and node serialistion/hashing functions

### Sum Encoding

The sum for any node is encoded in a byte array with a fixed size (`[8]byte`) this allows for the sum to fully represent a `uint64` value in binary form. The golang `encoding/binary` package is used to encode the sum with `binary.BigEndian.PutUint64(sumBz[:], sum)` into a byte array `sumBz`.

In order for the SMST to include the sum into a leaf node the SMT the SMST initialises the SMT with the `WithValueHasher(nil)` option so that the SMT does **not** hash any values. The SMST will then hash the value and append the sum bytes to the end of the hashed value, using whatever `ValueHasher` was given to the SMST on initialisation.

```mermaid
graph TD
subgraph KVS[Key-Value-Sum]
K1["Key: foo"]
K2["Value: bar"]
K3["Sum: 10"]
end
subgraph SMST[SMST]
SS1[ValueHasher: SHA256]
subgraph SUM["SMST.Update()"]
SU1["ValueHash = ValueHasher(Value)"]
SU2["sumBytes = binary(Sum)"]
SU3["ValueHash = append(valueHash, sumBytes...)"]
end
end
subgraph SMT[SMT]
SM1[ValueHasher: nil]
subgraph UPD["SMT.Update()"]
U1["valueHash = value"]
U2["SMT.nodeStore.Set(Key, ValueHash)"]
end
end
KVS -->Key,Value,Sum--> SMST
SMST -->Key,ValueHash--> SMT
```

### Digests

The digest for any node in the SMST is calculated in partially the same manner as the regular SMT. The main differences are that the sum is included in the digest `preimage` - meaning the hash of any node's data includes **BOTH** its data _and_ sum. In addition to this the sum is appended to the hash producing digests like so:

`digest = [node hash]+[8 byte sum]`

Therefore for the following node types, the digests are computed as follows:

- **Inner Nodes**
- Prefix: `[]byte{1}`
- `sumBytes = binary(leftChild.sum+rightChild.sum)`
- `digest = hash([]byte{1} + leftChild.digest + rightChild.digest + sumBytes) + sumBytes`
- **Extension Nodes**
- Prefix: `[]byte{2}`
- `sumBytes = binary(child.sum)`
- `digest = hash([]byte{2} + pathBounds + path + child.digest + sumBytes) + sumBytes`
- **Leaf Nodes**
- Prefix: `[]byte{0}`
- `sumBytes = binary(sum)`
- `digest = hash([]byte{0} + path + valueHash) + sumBytes`
- **Note**: as mentioned above the `valueHash` is already appended with the `sumBytes` prior to insertion in the underlying SMT
- **Lazy Nodes**
- Prefix of the actual node type is stored in the persisted digest as determined above
- `digest = persistedDigest`

This means that with a hasher such as `sha256.New()` whose hash size is `32 bytes`, the digest of any node will be `40 bytes` in length.

### Visualisations

The following diagrams are representations of how the tree and its components can be visualised.

#### General Tree Structure

None of the nodes have a different structure to the regular SMT, but the digests of nodes now include their sum as described above and the sum is included in the leaf node's value. For the purposes of visualization, the sum is included in all nodes as an extra field.

```mermaid
graph TB
subgraph Root
A1["Digest: Hash(Hash(Path+H1)+Hash(H2+(Hash(H3+H4)))+Binary(20))+Binary(20)"]
A2[Sum: 20]
end
subgraph BI[Inner Node]
B1["Digest: Hash(H2+(Hash(H3+H4))+Binary(12))+Binary(12)"]
B2[Sum: 12]
end
subgraph BE[Extension Node]
B3["Digest: Hash(Path+H1+Binary(8))+Binary(8)"]
B4[Sum: 8]
end
subgraph CI[Inner Node]
C1["Digest: Hash(H3+H4+Binary(7))+Binary(7)"]
C2[Sum: 7]
end
subgraph CL[Leaf Node]
C3[Digest: H2]
C4[Sum: 5]
end
subgraph DL1[Leaf Node]
D1[Digest: H3]
D2[Sum: 4]
end
subgraph DL2[Leaf Node]
D3[Digest: H4]
D4[Sum: 3]
end
subgraph EL[Leaf Node]
E1[Digest: H1]
E2[Sum: 8]
end
Root-->|0| BE
Root-->|1| BI
BI-->|0| CL
BI-->|1| CI
CI-->|0| DL1
CI-->|1| DL2
BE-->EL
```

#### Binary Sum Digests

The following diagram shows the structure of the digests of the nodes within the tree in a simplified manner, again none of the nodes have a `sum` field, but for visualisation purposes the sum is included in all nodes with the exception of the leaf nodes where the sum is shown as part of its value.

```mermaid
graph TB
subgraph RI[Inner Node]
RIA["Root Hash: Hash(D6+D7+Binary(18))+Binary(18)"]
RIB[Sum: 15]
end
subgraph I1[Inner Node]
I1A["D7: Hash(D1+D5+Binary(11))+Binary(11)"]
I1B[Sum: 11]
end
subgraph I2[Inner Node]
I2A["D6: Hash(D3+D4+Binary(7))+Binary(7)"]
I2B[Sum: 7]
end
subgraph L1[Leaf Node]
L1A[Path: 0b0010000]
L1B["Value: 0x01+Binary(6)"]
L1C["H1: Hash(Path+Value+Binary(6))"]
L1D["D1: H1+Binary(6)"]
end
subgraph L3[Leaf Node]
L3A[Path: 0b1010000]
L3B["Value: 0x03+Binary(3)"]
L3C["H3: Hash(Path+Value+Binary(3))"]
L3D["D3: H3+Binary(3)"]
end
subgraph L4[Leaf Node]
L4A[Path: 0b1100000]
L4B["Value: 0x04+Binary(4)"]
L4C["H4: Hash(Path+Value+Binary(4))"]
L4D["D4: H4+Binary(4)"]
end
subgraph E1[Extension Node]
E1A[Path: 0b01100101]
E1B["Path Bounds: [2, 6)"]
E1C[Sum: 5]
E1D["H5: Hash(Path+PathBounds+D2+Binary(5))"]
E1E["D5: H5+Binary(5)"]
end
subgraph L2[Leaf Node]
L2A[Path: 0b01100101]
L2B["Value: 0x02+Binary(5)"]
L2C["H2: Hash(Path+Value+Hex(5))+Binary(5)"]
L2D["D2: H2+Binary(5)"]
end
RI -->|0| I1
RI -->|1| I2
I1 -->|0| L1
I1 -->|1| E1
E1 --> L2
I2 -->|0| L3
I2 -->|1| L4
```

## Sum

The `Sum()` function adds functionality to easily retrieve the tree's current sum as a `uint64`.

## Example

```go
package main

import (
"crypto/sha256"
"fmt"

"github.com/pokt-network/smt"
)

func main() {
// Initialise a new key-value store to store the nodes of the tree
// (Note: the tree only stores hashed values, not raw value data)
nodeStore := smt.NewSimpleMap()

// Initialise the tree
tree := smt.NewSparseMerkleSumTree(nodeStore, sha256.New())

// Update tree with keys, values and their sums
_ = tree.Update([]byte("foo"), []byte("oof"), 10)
_ = tree.Update([]byte("baz"), []byte("zab"), 7)
_ = tree.Update([]byte("bin"), []byte("nib"), 3)

sum := tree.Sum()
fmt.Println(sum == 20) // true

// Generate a Merkle proof for "foo"
proof, _ := tree.Prove([]byte("foo"))
root := tree.Root() // We also need the current tree root for the proof

// Verify the Merkle proof for "foo"="oof" where "foo" has a sum of 10
if valid := smt.VerifySumProof(proof, root, []byte("foo"), []byte("oof"), 10, tree.Spec()); valid {
fmt.Println("Proof verification succeeded.")
} else {
fmt.Println("Proof verification failed.")
}
}
```

[plasma core docs]: https://plasma-core.readthedocs.io/en/latest/specs/sum-tree.html
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Note: **Requires Go 1.18+**
- [Verification](#verification)
- [Database](#database)
- [Data Loss](#data-loss)
- [Sparse Merkle Sum Tree](#sparse-merkle-sum-tree)
- [Example](#example)

## Overview
Expand All @@ -48,7 +49,7 @@ The SMT has 4 node types that are used to construct the tree:
- Prefixed `[]byte{0}`
- `digest = hash([]byte{0} + path + value)`
- Lazy Nodes
- Prefix of the actual node type is stored in the digest
- Prefix of the actual node type is stored in the persisted digest as determined above
- `digest = persistedDigest`

### Inner Nodes
Expand Down Expand Up @@ -300,6 +301,10 @@ When changes are commited to the underlying database using `Commit()` the digest

In the event of a system crash or unexpected failure of the program utilising the SMT, if the `Commit()` function has not been called, any changes to the tree will be lost. This is due to the underlying database not being changed **until** the `Commit()` function is called and changes are persisted.

## Sparse Merkle Sum Tree

This library also implements a Sparse Merkle Sum Tree (SMST), the documentation for which can be found [here](./MerkleSumTree.md).

## Example

```go
Expand Down
33 changes: 20 additions & 13 deletions bulk_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,21 @@ package smt

import (
"bytes"
crand "crypto/rand"
"crypto/sha256"
"math/rand"
"testing"

"github.com/stretchr/testify/require"
)

type opCounts struct{ ops, inserts, updates, deletes int }
type bulkop struct{ key, val []byte }
type (
opCounts struct{ ops, inserts, updates, deletes int }
bulkop struct{ key, val []byte }
)

// Test all tree operations in bulk.
func TestBulkOperations(t *testing.T) {
rand.Seed(1)

cases := []opCounts{
// Test more inserts/updates than deletions.
{200, 100, 100, 50},
Expand All @@ -35,18 +38,21 @@ func bulkOperations(t *testing.T, operations int, insert int, update int, delete
max := insert + update + delete
var kv []bulkop

r := rand.New(rand.NewSource(1))
for i := 0; i < operations; i++ {
n := rand.Intn(max)
if n < insert { // Insert
keyLen := 16 + rand.Intn(32)
keyLen := 16 + r.Intn(32)
key := make([]byte, keyLen)
rand.Read(key)
_, err := crand.Read(key)
require.NoError(t, err)

valLen := 1 + rand.Intn(64)
valLen := 1 + r.Intn(64)
val := make([]byte, valLen)
rand.Read(val)
_, err = crand.Read(val)
require.NoError(t, err)

err := smt.Update(key, val)
err = smt.Update(key, val)
if err != nil {
t.Fatalf("error: %v", err)
}
Expand All @@ -56,11 +62,12 @@ func bulkOperations(t *testing.T, operations int, insert int, update int, delete
continue
}
ki := rand.Intn(len(kv))
valLen := 1 + rand.Intn(64)
valLen := 1 + r.Intn(64)
val := make([]byte, valLen)
rand.Read(val)
_, err := crand.Read(val)
require.NoError(t, err)

err := smt.Update(kv[ki].key, val)
err = smt.Update(kv[ki].key, val)
if err != nil {
t.Fatalf("error: %v", err)
}
Expand All @@ -69,7 +76,7 @@ func bulkOperations(t *testing.T, operations int, insert int, update int, delete
if len(kv) == 0 {
continue
}
ki := rand.Intn(len(kv))
ki := r.Intn(len(kv))

err := smt.Delete(kv[ki].key)
if err != nil && err != ErrKeyNotPresent {
Expand Down
Loading