Skip to content

Snapshots

Charitha Bandi edited this page Apr 24, 2024 · 4 revisions

Kwil 0.8 starts supporting logical snapshots of the Kwildb. Snapshots enable the below functionality:

  • Accelerate the catch-up process of the new node joining a network using statesync. This results to drastically shorter times for joining a network but the block history is truncated.

  • Network migrations: Start the network with the previous state with the Kwild version upgrades

Snapshots

Snapshots can be enabled on the Kwil node by providing the below configuration in the config.toml

[app.snapshots]

# Enables snapshots
enabled = true

# Path to the snapshots directory
snapshot_dir = "snapshots"

# (Multiples of) Heights at which the database is snapshotted.
snapshot_heights = 10

# Maximum number of snapshots to store on disk
max_snapshots = 3

Snapshot Store

Snapshots generated are stored on the disk at the location specified above using snapshot_dir. Below is the strict layout of the snapshots on the disk.

SnapshotsDir:
    snapshot-<height1>:
	snapshot-format-0
	    header.json
	    chunks:
    	        chunk-0
		chunk-1
		...
		chunk-n

    snapshot-<height2>:
	snapshot-format-0
	    header.json
	    chunks:
		chunk-0
		chunk-1
		...
                chunk-n

Snapshot store only stores upto configured max_snapshots, when the limit is reached, the oldest snapshots are deleted to accommodate the new ones.

Kwild currently only supports snapshots of format plain-text sql logical dumps with gzip compression (Format version 0). This can be extended to support other formats in future.

Deterministic Snapshots

To make snapshots deterministic at a given block height across all the nodes following precautions are taken:

  • Snapshots are taken at the end of the block commit.

  • Create a PostgreSQL snapshot-id by executing pg_export_snapshot() using a transaction with serializable isolation level and don't close the transaction until the snapshot is generated.

  • Running pg_dump with the snapshot-id generated above.

  • Sanitizing the logical dump by removing white spaces, SET and SELECT statements that are not required, ordering the COPY blocks deterministically.

This ensures that the snapshots taken at a given block height are consistent and deterministic across all the nodes.

StateSync

New nodes joining the network can now sync rapidly using the StateSync process, but the block history will be truncated to the block at the snapshot height.

If a new node wants to join using StateSync rather than BlockSync, can do so with the below configuration.

#######################################################
###         State Sync Configuration Options        ###
#######################################################
[chain.statesync]
# State sync rapidly bootstraps a new node by discovering, fetching, and restoring a state machine
# snapshot from peers instead of fetching and replaying historical blocks. Requires some peers in
# the network to take and serve state machine snapshots. State sync is not attempted if the node
# has any local state (LastBlockHeight > 0). The node will have a truncated block history,
# starting from the height of the snapshot.
enable = true

# Trusted snapshot providers (comma-separated chain RPC servers) are the source-of-truth for the snapshot integrity.
# Snapshots are accepted for statesync only after verifying it with these trusted snapshot providers.
# These are also used for light client verification of the synced state machine and
# retrieval of state data for node bootstrapping.
# Light client verification needs a trusted height and corresponding block hash obtained from a
# trusted source, and a period during which validators can be trusted.
rpc_servers = "http://localhost:26657,http://localhost:26658"
trust_height = 12
trust_hash = "782BA2706B335DAECCE2289118E4BB09A1F11C260EA315DBB911F1DB4BBA6B2B"
trust_period = "36000s"

# Time to spend discovering snapshots before initiating a restore.
discovery_time = "15s"

# The timeout duration before re-requesting a chunk, possibly from a different
# peer (default: 1 minute).
chunk_request_timeout = "10s"

Trusted snapshot providers can be any full nodes that you can trust with snapshots enabled and can provide block state information upon request. At least, 2 trusted snapshots are to be provided for the statesync to work.

Trusted Hash & Height

Trust hash and trust height can be acquired through publicly exposed RPC’s or a block explorer which you trust.

If one of the trusted rpc servers address is localhost:26657, you can retrieve the block hash by querying the status endpoint either on browser or through curl

Terminal

curl -s http://localhost:26657/status | jq "{height: .result.sync_info.latest_block_height, hash: .result.sync_info.latest_block_hash}"
{
  "height": "1180",
  "hash": "D42624533A74D380CD0ECC0735E355751A6560FFD34023F10B082DB19A81DEE5"
}

Browser

http://localhost:26657/status

Screenshot 2024-04-24 at 11 06 37 AM

Node Join Through StateSync

  • Node first verifies the trust_hash at the height by querying the trusted rpc_servers.

  • Node starts the snapshot discovery process by broadcasting the snapshot requests to all its peers.

  • Peers can respond to these snapshot discovery requests with the list of snapshots metadata that they have.

  • Node selects one of the discovered snapshot and validates it's integrity with the trusted snapshot providers and can only proceed forward with a snapshot that is considered valid by the snapshot providers.

  • Once the snapshot metadata is verified, node starts requesting for the snapshot chunks from all it's peers

  • Once all the chunks are received, the node can restore the KwilDB state by streaming the snapshot chunks to the psql.