Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merge block databases #2829

Merged
merged 7 commits into from
Jul 24, 2020
Merged

Conversation

wezrule
Copy link
Contributor

@wezrule wezrule commented Jun 26, 2020

send/change/open/receive & state block databases have been removed. They are merged into a blocks database, and are serialized as follows:
block_type -> block -> sideband

The block_type is a new addition which allows the block and sideband to be correctly deserialized. There is block_details in sideband which could be used, but it would require shrinking the number of bits for the epoch and alot more other interface changes which didn't seem worth it, an extra byte for each block is used instead for the block_type. We already follow this approach for serializing blocks in unchecked for instance so it allowed code re-use there too.

Removes block_count_type RPC as it's pretty useless now as there's no distinction possible in LMDB without extra IO to store the counts. This is possible with RocksDB as we are storing the count anyway, but I think having consistent RPC interface is more important.

The upgrade path is interesting because we do not have a way to merge 2 databases with different value types. For this I used in-memory sorting of smallish databases (legacy open/receive/change) first and then creating temporary databases for that and the send/state blocks which added the new value type (extra block_type). The smallest databases were then merged before the larger ones to reduce as much iteration.

The benefits this provides, will mean RocksDB does not need to worry about stale memtables anymore. Checking if a block exists sometimes took up to 5 database reads, now only 1 is required. It has also reduced complexity in various areas.

The database upgrade was tested on a few systems, logging info shown below:

Windows 10 - SSD, Ryzen 3700X, 64GB RAM (20 minutes)

[2020-Jun-26 14:46:35.843756]: Preparing v18 to v19 database upgrade...
[2020-Jun-26 14:47:26.973934]: Write legacy open/receive/change to new format
[2020-Jun-26 14:47:37.987258]: Write legacy send to new format
[2020-Jun-26 14:48:43.137459]: Merge legacy open/receive/change with legacy send blocks
[2020-Jun-26 14:48:59.067979]: Write state blocks to new format
[2020-Jun-26 15:00:04.990695]: Merging all legacy blocks with state blocks
[2020-Jun-26 15:07:22.412141]: Finished upgrading all blocks to new blocks database

Windows 10 - SSD (NVME), Ryzen 3700X, 64GB RAM (11 minutes)

[2020-Jun-27 08:43:20.698023]: Preparing v18 to v19 database upgrade...
[2020-Jun-27 08:43:38.285549]: Write legacy open/receive/change to new format
[2020-Jun-27 08:43:48.103760]: Write legacy send to new format
[2020-Jun-27 08:44:19.196946]: Merge legacy open/receive/change with legacy send blocks
[2020-Jun-27 08:44:32.918036]: Write state blocks to new format
[2020-Jun-27 08:50:06.377159]: Merging all legacy blocks with state blocks
[2020-Jun-27 08:54:47.716775]: Finished upgrading all blocks to new blocks database

Ubuntu - SSD, Ryzen 2600, 16GB RAM (19 mins)

[2020-Jun-26 14:49:07.484607]: Preparing v18 to v19 database upgrade...
[2020-Jun-26 14:49:30.491715]: Write legacy open/receive/change to new format
[2020-Jun-26 14:49:37.030250]: Write legacy send to new format
[2020-Jun-26 14:50:16.205321]: Merge legacy open/receive/change with legacy send blocks
[2020-Jun-26 14:50:24.769599]: Write state blocks to new format
[2020-Jun-26 15:03:05.067088]: Merging all legacy blocks with state blocks
[2020-Jun-26 15:08:08.497628]: Finished upgrading all blocks to new blocks database

The starting ledger was 36GB, unvacuumed after upgrade it becomes 61GB, vacuumed 22GB. Currently it is set up to automatically vacuum after upgrade, however this might be difficult for some users with storage constraints, perhaps we should make this step optional, or also vacuum the pre-upgraded ledger first?

Also did some benchmarking of LMDB/RocksDB performance & ledger size when using fixed and variable sized keys:

Fixed sized value (s) Variable sized value (s)
LMDB
get 11.2 10.01
put 46 45
RocksDB
get 3.1 3.1
put 37 37

Didn't see any real difference between fixed and variable sized values from LMDB or RocksDB, or a difference in ledger size. All gets were recorded after a computer restart to prevent any OS caching affected the results.

@wezrule wezrule added documentation This item indicates the need for or supplies updated or expanded documentation performance Performance/resource utilization improvement database Relates to lmdb or rocksdb labels Jun 26, 2020
@wezrule wezrule added this to the V22.0 milestone Jun 26, 2020
@wezrule wezrule self-assigned this Jun 26, 2020
@wezrule wezrule added database structure If the database changes it needs updating in the nanodb repository rpc Changes related to Remote Procedure Calls semantic Change to node APIs (separate label) which impacts interpretation of data, integrations impacted. labels Jun 28, 2020
@wezrule wezrule added the blocker Some future items cannot be completed until this is merged. label Jul 6, 2020
@guilhermelawless
Copy link
Contributor

Had a bootstrap in progress, stopped and upgraded the database with this PR (~21M block count, 30M unchecked) on Ryzen 3600, 16GB RAM and NVME SSD:

[2020-Jul-09 09:13:52.792313]: Preparing v18 to v19 database upgrade...
[2020-Jul-09 09:15:23.890906]: Write legacy open/receive/change to new format
[2020-Jul-09 09:15:46.475696]: Write legacy send to new format
[2020-Jul-09 09:18:15.845385]: Merge legacy open/receive/change with legacy send blocks
[2020-Jul-09 09:19:22.825827]: Write state blocks to new format
[2020-Jul-09 09:25:42.851488]: Merging all legacy blocks with state blocks
[2020-Jul-09 09:28:50.550178]: Finished upgrading all blocks to new blocks database
[2020-Jul-09 09:29:04.930051]: Preparing vacuum...
[2020-Jul-09 09:51:58.636846]: Vacuum succeeded.

Though painful the benefits are significant.

I agree with vacuuming pre-upgrade, possibly even a rebuild as we've seen that speeds up upgrades considerably and reduces the maximum size reached during the upgrade.


@zhyatt Should this have the removal label instead of semantic?

@zhyatt zhyatt added removal Indicates functionality is being removed and removed semantic Change to node APIs (separate label) which impacts interpretation of data, integrations impacted. labels Jul 9, 2020
@zhyatt
Copy link
Collaborator

zhyatt commented Jul 9, 2020

@guilhermelawless Yes, just swapped out the labels.

@wezrule
Copy link
Contributor Author

wezrule commented Jul 13, 2020

Windows 10 - SSD (NVME), Ryzen 3700X, 64GB RAM (11 minutes)

[2020-Jul-13 21:17:52.125735]: Preparing vacuum...
[2020-Jul-13 21:35:59.116457]: Vacuum succeeded.
[2020-Jul-13 21:35:59.118457]: Preparing v18 to v19 database upgrade...
[2020-Jul-13 21:36:07.470989]: Write legacy open/receive/change to new format
[2020-Jul-13 21:36:31.223525]: Write legacy send to new format
[2020-Jul-13 21:37:06.718630]: Merge legacy open/receive/change with legacy send blocks
[2020-Jul-13 21:37:52.722318]: Write state blocks to new format
[2020-Jul-13 21:43:13.007033]: Merging all legacy blocks with state blocks
[2020-Jul-13 21:48:06.407661]: Finished upgrading all blocks to new blocks database
[2020-Jul-13 21:48:10.481677]: Preparing vacuum...
[2020-Jul-13 22:07:26.969579]: Vacuum succeeded.

I did some experimenting with a rebuild vacuum before the upgrade and afterwards. After first rebuild data.ldb reaches 63GB at peak, then vacuumed to 18GB. After upgrade and rebuild the data.ldb file reached 89GB, and then need an additional 18GB for the copy w/ compaction, so 117GB in total at peak. The v18-19 upgrade itself took a couple minutes longer as well, in addition would need 18 minutes for pre-upgrade rebuild/vacuum, doesn't seem worth it.

Copy link
Contributor

@guilhermelawless guilhermelawless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending doc updates.

@SergiySW
Copy link
Contributor

SergiySW commented Jul 24, 2020

Ubuntu 20.04 - NVMe SSD Samsung 970 EVO Plus 512GB, Ryzen 3900X, 64GB RAM (5 mins)

[2020-Jul-24 18:53:15.493891]: Preparing v18 to v19 database upgrade...
[2020-Jul-24 18:53:23.304675]: Write legacy open/receive/change to new format
[2020-Jul-24 18:53:25.715227]: Write legacy send to new format
[2020-Jul-24 18:53:35.892972]: Merge legacy open/receive/change with legacy send blocks
[2020-Jul-24 18:53:41.872384]: Write state blocks to new format
[2020-Jul-24 18:54:56.944585]: Merging all legacy blocks with state blocks
[2020-Jul-24 18:58:21.824351]: Finished upgrading all blocks to new blocks database
[2020-Jul-24 18:58:26.377069]: Preparing vacuum...
[2020-Jul-24 18:59:12.450875]: Vacuum succeeded.

@wezrule wezrule merged commit df5c0b4 into nanocurrency:develop Jul 24, 2020
@wezrule wezrule deleted the merge_block_databases branch July 24, 2020 18:30
@SergiySW
Copy link
Contributor

Ubuntu 20.04 - NVMe Optane, Ryzen 3900X, 64GB RAM (2 mins)

[2020-Jul-28 08:17:59.413575]: Preparing v18 to v19 database upgrade...
[2020-Jul-28 08:18:01.887120]: Write legacy open/receive/change to new format
[2020-Jul-28 08:18:04.311892]: Write legacy send to new format
[2020-Jul-28 08:18:08.515955]: Merge legacy open/receive/change with legacy send blocks
[2020-Jul-28 08:18:14.605210]: Write state blocks to new format
[2020-Jul-28 08:19:05.256456]: Merging all legacy blocks with state blocks
[2020-Jul-28 08:20:08.191290]: Finished upgrading all blocks to new blocks database
[2020-Jul-28 08:20:10.844470]: Preparing vacuum...
[2020-Jul-28 08:20:25.077202]: Vacuum succeeded.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
blocker Some future items cannot be completed until this is merged. database structure If the database changes it needs updating in the nanodb repository database Relates to lmdb or rocksdb documentation This item indicates the need for or supplies updated or expanded documentation performance Performance/resource utilization improvement removal Indicates functionality is being removed rpc Changes related to Remote Procedure Calls
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants