Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[OPS] FEATURE REQUEST: Pruning system - sign and forget #3637

Open
albttx opened this issue Jan 29, 2025 · 5 comments
Open

[OPS] FEATURE REQUEST: Pruning system - sign and forget #3637

albttx opened this issue Jan 29, 2025 · 5 comments
Labels

Comments

@albttx
Copy link
Member

albttx commented Jan 29, 2025

Description

Currently, it is not possible to prune a node, all nodes store all blocks since block 1.

Once mainnet will arrive, there will be more transactions and storing all state since block 1 might create some storage issues on the nodes

There is multiple advantage about node pruning:

  • reduce start-up time
  • snapshots node are smaller, which make an easier and faster system to sync / recover / migrate a node.
  • take less spaces on disks
  • increase node performances

Another feature i would love to be implemented, which was already discuss once during a meeting with @jaekwon (long time ago) is the "Sign and forget" system.

Tendermint main bottleneck is disk usage, large chain must use NVMe for efficiency, an disks without pruning are growing really fast!

I believe a validator node don't require to store more than 1 block in memory, only the latest state should be required to store.

This feature i believe would increase BY A LOT performances, because we could almost get rid of disk usage and do everything in memory. If it's not done by code, it could be done by putting the data/ dir into a tmpfs which is a in-memory file system.

FYI: On injective, with a ~1s block time, and the amount for txs, there is no possiblity to set the pruning because it cause a lot of block miss signature, and node is growing to couple hundreads of Gb in couple days...
A pruning from a pruned snapshot node is require every 1-2 week.

Of course, we might not have the same amount of txs from day 1, but if we start to have oracle writing in gno.land on every blocks, we could have node storage issues faster than we can expect.

cc: @moul @zivkovicmilos @gnolang/devops wdyt ?

ps: this issue will be mentioned in my gnops.io article i'm writing about snapshots node.

@n2p5
Copy link
Contributor

n2p5 commented Jan 29, 2025

Thanks for writing this up @albttx.

We should also make sure to outline the disadvantages and tradeoffs of node pruning.
Also do we have documentation on who the current pruning processes work? Is this a "stop the world" operation? Or can this be done concurrently on live nodes? Again, what are the tradeoffs and what is the block height thresholds we want to shoot for?

FYI: On injective, with a ~1s block time, and the amount for txs, there is no possiblity to set the pruning because it cause a lot of block miss signature, and node is growing to couple hundreads of Gb in couple days...

What are the constraints here? and what is the cause of the misses?

This feature i believe would increase BY A LOT performances, because we could almost get rid of disk usage and do everything in memory. If it's not done by code, it could be done by putting the data/ dir into a tmpfs which is a in-memory file system.

It would be cool to setup a small experiment to quantify what A LOT looks like. Also I wonder if we could do some sort of WAL pattern where it goes RAMdisk > NVMe > block store with some sort of graceful degrade pattern. Sorting out what we can handle in churn and recorvery modes would be really interesting to sort out.

For instance, in the Kubernetes world, I could see working with LocalPVs where we mount a RAMDisk and NVMe as part of the configuration with replication rules. Again, all of this stuff has tradeoffs, so it would be important to formulate small experiments as well as performing degraded state testing (cascading failure in assumptions, etc)

I love working on these types of problems and it could lead to some really nice generalization if we approach it correctly.

@albttx
Copy link
Member Author

albttx commented Jan 29, 2025

Good to add in this thread, the cosmos-sdk configuration for pruning

# default: the last 362880 states are kept, pruning at 10 block intervals
# nothing: all historic states will be saved, nothing will be deleted (i.e. archiving node)
# everything: 2 latest states will be kept; pruning at 10 block intervals.
# custom: allow pruning options to be manually specified through 'pruning-keep-recent', and 'pruning-interval'
pruning = "default"

# These are applied if and only if the pruning strategy is custom.
pruning-keep-recent = "0"
pruning-interval = "0"

@albttx
Copy link
Member Author

albttx commented Jan 29, 2025

We should also make sure to outline the disadvantages and tradeoffs of node pruning.

The only tradeoffs is that if nobody run a full-node, it's become impossible to recover the state of a previous block.

It's a quite complexe system to run, FYI: a cosmoshub fullnode (ie: archive node) is over 13 TB of data, only one company is providing something: https://quicksync.io/cosmos

After, there is always explorers that will store block informations in standard database, where it's should be possible to verify blocks with signature.

What are the constraints here? and what is the cause of the misses?

The low block time (ie: timeout_commit reduced) + the amount of txs per blocks.

gno.land isn't out of trouble, as you can see in slack #gno-infra-alerts we had a lot of issues with test5 when gnoswap was doing a lot of txs, validators where missing up to 500 blocks in a row...

Imagine with multiple projects like gnoswap at the same time! networks was probably one of the reasons and milos PR #2852 might help a lot.

Test6 will run latest version with @zivkovicmilos fix. Let's see how the network will react under high load.

@n2p5
Copy link
Contributor

n2p5 commented Jan 29, 2025

The only tradeoffs is that if nobody run a full-node, it's become impossible to recover the state of a previous block.

"only" 😂 .

This sounds like an interesting area for research, in that it would be useful to have a "hot", "warm" and "cold" storage pattern that keeps a progressively longer block history, with the ability to completely reconstruct the complete block history from cold (and cheap) storage for all to use.

@n0izn0iz
Copy link
Contributor

It is a "stop the world" operation in cosmos-sdk 0.47 (not sure about latest versions)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
Status: Triage
Development

No branches or pull requests

3 participants