Author: Kostiantyn Stepaniuk <kostia@thirdhash.com>
Status: Proposed
Created: 2018-10-09
Download UTXO set instead of the whole blockchain during initial sync to reduce the bandwidth and the sync time.
Currently, to start a full node it requires to download the whole chain.
At the time of writing, Bitcoin's chain weights 200GB, and they generate
the block every 10 minutes. In our case, we create the block every 16 seconds
which will lead to significantly larger chain size. Besides consuming
a lot of space and a long time to sync the node, downloading the whole chain
causes also high bandwidth usage. According to statoshi.info,
on average block
message consumes 252 KB/s of outbound traffic which is
~12x more than the second tx
message, 20 KB/s. The reason is that many nodes
are being added to the network but have a short lifetime. We can see this pattern on
bitnodes
To reduce bandwidth, storage and time to sync, we introduce the snapshot which is the set of UTXOs. Effectively, to validate/propose new transactions/blocks the node needs to know only UTXOs.
Snapshot generation is optional however is highly recommended as it helps the network. New nodes can quicker join the network and nodes that serve the snapshot can save the bandwidth as they will be asked for the UTXO set only instead of the whole blockchain.
Ideally, we would like to compute the snapshot for every block the node sees. However, computation
of the snapshot is expensive. It takes roughly 20 minutes to dump all the UTXO set from the chainstate DB
and compute the ECMH of it. To overcome this, we will generate the snapshot only for the checkpoint
that can be potentially finalized epoch with the step of 150
epochs. The reason of doing it every 150 epochs is
to generate the snapshot ~1/day. Current block rate is: 1 block/16 secs or 1/16*60*60*24 = 5400 blocks = 108 epochs
Once the node receives the block which has (height + 2) % (50 * 150) == 0
1, after processing the block (and making it the tip),
the node takes the levelDB snapshot (iterator of chainstate
) and tries to generate the actual snapshot in the separate
thread. Taking the levelDB iterator, it guarantees that we have the same view of chainstate
while we are keeping it.
If the snapshot was successfully generated, we compute its hash and store it in chainstate
DB for later retrieval.
The node SHOULD keep last 5
snapshots. Validators MUST always create snapshots.
Proposers can disable snapshot creation by starting the node with flag -createsnapshot=0
.
Snapshot is the collection of UTXOSubset. The schema of one UTXOSubset is the following:
UTXOSubset 2
field | type | bytes | description |
---|---|---|---|
tx_id | uint256 | 32 | TX hash |
height | uint32 | 4 | block height the TX was included |
tx_type | uint8_t | 1 | type of the transaction |
output count | VarInt | 1-9 | number of outputs this TX contains |
index | uint32 | 4 | output index |
output | CTxOut | the actual output |
CTxOut
field | type | bytes | description |
---|---|---|---|
amount | int64 | 8 | amount the output has |
script size | VarInt | 1-9 | size of the script data |
script data | vector<unsigned char> | script data |
To compute the snapshot hash we don't need to generate the snapshot itself. We use a rolling update Elliptic Curve Multiset Hash to compute the hash once the new block arrives by subtracting inputs and adding outputs. Snapshot hash is the concatenation of all UTXOs + stake modifier and chain_work.
The formula is the following: ECMH(UTXO03 4 + UTXO1 + ... + UTXOn + stake_modifier5 + chain_work6)
UTXO
field | type | bytes | description |
---|---|---|---|
out_point | COutPoint | 36 | identifier of the output |
height | uint32 | 4 | block height the TX was included |
tx_type | uint8_t | 1 | type of the transaction |
output | CTxOut | the actual output |
COutPoint
field | type | bytes | description |
---|---|---|---|
hash | uint256 | 32 | tx hash |
n | uint32 | 4 | index of the output |
Sample code of computing the snapshot hash:
secp256k1_context *ctx = secp256k1_context_create(SECP256K1_CONTEXT_NONE);
secp256k1_multiset multiset;
secp256k1_multiset_init(ctx, &multiset);
secp256k1_multiset_add(ctx, &multiset, utxo_0.data(), utxo_0.size());
secp256k1_multiset_add(ctx, &multiset, utxo_1.data(), utxo_1.size());
secp256k1_multiset_add(ctx, &multiset, utxo_2.data(), utxo_2.size());
secp256k1_multiset_add(ctx, &multiset, stake_modifier, 32);
secp256k1_multiset_add(ctx, &multiset, chain_work, 32);
uint256 hash;
secp256k1_multiset_finalize(ctx, in.data(), &multiset);
return hash;
Notice: there is no count byte in front of the message 7.
Sample code of updating the hash when a new block arrives:
secp256k1_multiset_remove(ctx, &multiset, input_0.data(), input_0.size());
secp256k1_multiset_remove(ctx, &multiset, input_1.data(), input_1.size());
secp256k1_multiset_add(ctx, &multiset, output_0.data(), output_0.size());
secp256k1_multiset_add(ctx, &multiset, output_1.data(), output_1.size());
To guarantee that the snapshot is a valid one and other nodes can trust it, we add the snapshot hash to the chain. Every proposer MUST include the snapshot hash inside the CoinBase transaction as part of the script of the first input 8. The schema of the input is the following:
CTxIn(
script_sig = CScript() << CScriptNum::serialize(height) << snapshot_hash;
)
Everyone who receives the block SHOULD validate that it has the correct snapshot hash. Snapshot hash points to the UTXOs of all blocks until the current one. To visualize it:
H - height
S - snapshot
H=0 (S=null)
*-----
blocks
* genesis block points to the empty snapshot hash
H=0 H=1 (S=0)
*----------*-------
snapshot 0 | blocks
* at hight 1 coinbase TX points to the snapshot that has only genesis UTXOs
H=0 H=1 (S=0) H=2 (S=1)
*----------*-----------*---
snapshot 1 | blocks
* at height 2 coinbase TX points to the snapshot that has UTXOs from H=0 and H=1
To start the node in ISD mode, it's required to set two parameters:
-prune=n
(n >= 1)-isd=1
If pruning mode is not enabled, the regular IBD will be used. This UIP proposes to make ISD as an extension to the pruning. Later we might revise it and make ISD enabled by default.
When a node starts in ISD mode, it has the following parameters set:
snapshot_chunk_timeout_sec=30
if the peer is not able to provide a valid chunk within this timeout this peer is marked as it doesn't have a snapshot.discovery_timeout_sec=120
time during which the node will discover available snapshots from peers. Peers joined after this timeout won't be asked for the snapshot.
These settings are important to not let the node remain stuck if it's surrounded by peers that can't provide the snapshot.
Once the node is started in ISD mode, it will perform the following:
- the node sends the discovery
getsnaphead
message to its peers to find the best snapshot - the node downloads headers
- the node starts to download the snapshot from peers that have the best (highest) snapshot
- when the last
snapshot
chunk is downloaded, the node verifies the hash of the whole snapshot - the node requests the block which is the parent block of the snapshot
- if the parent block has the correct snapshot hash, the node applies the snapshot and leaves ISD
If any error happens at any step, the node discards the snapshot (if it downloaded one) and switches to the 2nd best one. If the node runs out of peers that can serve the snapshot, it will switch to the IBD.
There are four new P2P messages:
getsnaphead
9 to request the snapshot headersnaphead
to reply with the snapshot headergetsnapshot
to request the snapshot chunksnapshot
to reply with the snapshot chunk
getsnaphead
This command doesn't have parameters.
snaphead
field | type | bytes | description |
---|---|---|---|
snapshot_hash | uint256 | 32 | snapshot hash |
block_hash | uint256 | 32 | at which block the snapshot was created |
stake_modifier | uint256 | 32 | stake modifier of this block |
chain_work | uint256 | 32 | chain work including this block |
total_utxo_subsets | uint64 | 8 | total UTXO subsets in snapshot |
getsnapshot
field | type | bytes | description |
---|---|---|---|
snapshot_hash | uint256 | 32 | which snapshot to request |
utxo_subset_index | uint64 | 8 | index of the first UTXO subset in the snapshot |
utxo_subset_count | uint16 | 2 | number of UTXO subsets to return |
snapshot
field | type | bytes | description |
---|---|---|---|
snapshot_hash | uint256 | 32 | snapshot hash |
utxo_subset_index | uint64 | 8 | index of the first UTXO subset in the snapshot |
utxo_subsets_count | VarInt | 1-9 | number of UTXO subsets in the message |
utxo_subsets | []UTXOSubset | actual UTXO subsets and their outputs |
Once the snapshot
message has the condition total_utxo_subsets == utxo_subset_index + utxo_subsets_count
it is considered the last chunk and no more getsnapshot
messages SHOULD be requested.
- Why (height + 2) % (50 * 150) == 0? Height counter starts from 0 so
(height + 1) % 50 == 0
represents the last checkpoint in the epoch.(height + 2) % 50 == 0
is the block before the last one. We don't generate the snapshot for the last finalized block because we need to also download the full parent block which contains the snapshot hash to validate that the snapshot we have is a correct one, and this parent block MUST be from the finalized epoch. - Why introduce UTXOSubset? The reason for introducing
UTXOSubset
instead of usingCoin
class is to reduce the storage as we don't need to serializeoutPoint
,height
andtx_type
for every output. - Why use UTXO instead of UTXOSubset? The reason we introduce
UTXO
instead of usingUTXOSubset
for computing the hash is thatUTXO
represents one output. It means that we can add/subtract outputs even without looking into the disk. - Why use UTXO instead of Coin? The reason for not using the
Coin
class is that it doesn't use the canonical P2P serialization as all other P2P messages do. - Why stake_modifier is used in snapshot hash calculation?
We include
stake_modifier
in the snapshot because this field is used to compute the next kernel hash. To guarantee that snapshot carries the right value, we must include it in the snapshot hash too. - Why chain_work is used in snapshot hash calculation?
We include
chain_work
into the snapshot because this field is used in the fork-choice rule to decide which fork to follow. To guarantee that snapshot carries the right value, we must include it in the snapshot hash too. - Why count byte is not prepended to UTXO set before computing the hash?
Count byte(s) is often prepended to the list that during unserialization it is used to determine
how big
std::vector
to initialize and how many records to read. For the case of the snapshot, the list of UTXOs can reach couple of gigabytes, so you'll never download it in one go instead, we download it in chunks. In this case, this extra byte is redundant as it won't be used. - Why snapshot hash is in the input? The reason to use the input instead of the output is to reduce the size of the transaction as creating an extra output has an overhead.
- Why call it getsnaphead? The command name can be maximum 12 characters long;
full version
getsnapshotheader
doesn't fit, so we shortened it.
- snapshot hash becomes part of the consensus rule. Everyone needs to perform extra work to compute the hash
- block size increases by 32 bytes as we add snapshot hash to the script of coinbase input
- every node that generates the snapshot, has extra work (~20 min to produce the snapshot on the current bitcoin UTXOs)
- extra disk space usage for full nodes, as they keep the whole chain + 5 snapshots.
Implemented in Unit-e
- 2019-04-16 Add
chain_work
and replaceis_coin_base
withtx_type
- 2019-04-16 Changed status to proposed.
- 2019-01-09 Extract meta fields from
snapshot
message tosnaphead
- 2018-12-12 Moved to UIP repository as UIP-11
- 2018-11-06 Accepted as ADR-11
This document and all its auxiliary files are dual-licensed under CC0 and MIT.