-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[OPS] FEATURE REQUEST: Pruning system - sign and forget #3637
Comments
Thanks for writing this up @albttx. We should also make sure to outline the disadvantages and tradeoffs of node pruning.
What are the constraints here? and what is the cause of the misses?
It would be cool to setup a small experiment to quantify what A LOT looks like. Also I wonder if we could do some sort of WAL pattern where it goes RAMdisk > NVMe > block store with some sort of graceful degrade pattern. Sorting out what we can handle in churn and recorvery modes would be really interesting to sort out. For instance, in the Kubernetes world, I could see working with LocalPVs where we mount a RAMDisk and NVMe as part of the configuration with replication rules. Again, all of this stuff has tradeoffs, so it would be important to formulate small experiments as well as performing degraded state testing (cascading failure in assumptions, etc) I love working on these types of problems and it could lead to some really nice generalization if we approach it correctly. |
Good to add in this thread, the cosmos-sdk configuration for pruning # default: the last 362880 states are kept, pruning at 10 block intervals
# nothing: all historic states will be saved, nothing will be deleted (i.e. archiving node)
# everything: 2 latest states will be kept; pruning at 10 block intervals.
# custom: allow pruning options to be manually specified through 'pruning-keep-recent', and 'pruning-interval'
pruning = "default"
# These are applied if and only if the pruning strategy is custom.
pruning-keep-recent = "0"
pruning-interval = "0" |
The only tradeoffs is that if nobody run a full-node, it's become impossible to recover the state of a previous block. It's a quite complexe system to run, FYI: a cosmoshub fullnode (ie: archive node) is over 13 TB of data, only one company is providing something: https://quicksync.io/cosmos After, there is always explorers that will store block informations in standard database, where it's should be possible to verify blocks with signature.
The low block time (ie: gno.land isn't out of trouble, as you can see in slack Imagine with multiple projects like gnoswap at the same time! networks was probably one of the reasons and milos PR #2852 might help a lot. Test6 will run latest version with @zivkovicmilos fix. Let's see how the network will react under high load. |
"only" 😂 . This sounds like an interesting area for research, in that it would be useful to have a "hot", "warm" and "cold" storage pattern that keeps a progressively longer block history, with the ability to completely reconstruct the complete block history from cold (and cheap) storage for all to use. |
It is a "stop the world" operation in cosmos-sdk 0.47 (not sure about latest versions) |
Description
Currently, it is not possible to prune a node, all nodes store all blocks since block 1.
Once mainnet will arrive, there will be more transactions and storing all state since block 1 might create some storage issues on the nodes
There is multiple advantage about node pruning:
Another feature i would love to be implemented, which was already discuss once during a meeting with @jaekwon (long time ago) is the "Sign and forget" system.
Tendermint main bottleneck is disk usage, large chain must use NVMe for efficiency, an disks without pruning are growing really fast!
I believe a validator node don't require to store more than 1 block in memory, only the latest state should be required to store.
This feature i believe would increase BY A LOT performances, because we could almost get rid of disk usage and do everything in memory. If it's not done by code, it could be done by putting the
data/
dir into atmpfs
which is a in-memory file system.FYI: On injective, with a ~1s block time, and the amount for txs, there is no possiblity to set the pruning because it cause a lot of block miss signature, and node is growing to couple hundreads of Gb in couple days...
A pruning from a pruned snapshot node is require every 1-2 week.
Of course, we might not have the same amount of txs from day 1, but if we start to have oracle writing in gno.land on every blocks, we could have node storage issues faster than we can expect.
cc: @moul @zivkovicmilos @gnolang/devops wdyt ?
ps: this issue will be mentioned in my gnops.io article i'm writing about snapshots node.
The text was updated successfully, but these errors were encountered: