Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

PoH on SLP2 is going very slowly #8445

Closed
mvines opened this issue Feb 25, 2020 · 8 comments · Fixed by #8468
Closed

PoH on SLP2 is going very slowly #8445

mvines opened this issue Feb 25, 2020 · 8 comments · Fixed by #8468

Comments

@mvines
Copy link
Contributor

mvines commented Feb 25, 2020

PoH should be running at ~2.5 slots a second, it seems to be running more a ~0.25 slots a second.

@mvines mvines added this to the Tofino v0.23.7 milestone Feb 25, 2020
@mvines
Copy link
Contributor Author

mvines commented Feb 25, 2020

I move the bootstrap validator to a colo machine, not sure that helped though

@mvines
Copy link
Contributor Author

mvines commented Feb 25, 2020

Issue reproduces if a test SLP cluster is launched with https://github.com/solana-labs/cluster

@mvines
Copy link
Contributor Author

mvines commented Feb 25, 2020

cc: #8450

@mvines
Copy link
Contributor Author

mvines commented Feb 25, 2020

Regression range is v0.23.2 - v0.23.6. Something in this window has caused PoH to slow down significantly: v0.23.2...v0.23.6

@garious
Copy link
Contributor

garious commented Feb 25, 2020

@pgarg66, I recall you tweaking the PoH thread affinity. Any chance that's related?

@mvines
Copy link
Contributor Author

mvines commented Feb 26, 2020

Update: I can make v0.23.6 PoH as fast as v0.23.2 with some genesis config changes

Using the v0.23.6 release binaries:

  1. Slow PoH can be reproduced by creating a genesis config with --slots-per-epoch 432000 and no warm-up epochs.
  2. Normal PoH can be reproduced by creating a genesis config with --slots-per-epoch 8192 and no warm-up epochs.

So we have some O(slots-per-epoch) code running in the PoH hot path

@mvines
Copy link
Contributor Author

mvines commented Feb 26, 2020

v0.23.2 behaves the same as v0.23.6, so this is not a regression. The bug was triggered by me disabling warm-up epochs, making slow PoH visible right from epoch 0 instead of 1-2 weeks in when the cluster finally reaches the normal epoch length

@mvines
Copy link
Contributor Author

mvines commented Feb 26, 2020

STR on master:

  1. Apply this patch. Note that the issue reproduces with sleepy PoH too!
diff --git a/multinode-demo/setup.sh b/multinode-demo/setup.sh
index ebb8ac8d8..fe2de2ce8 100755
--- a/multinode-demo/setup.sh
+++ b/multinode-demo/setup.sh
@@ -27,7 +27,8 @@ $solana_keygen new --no-passphrase -so "$SOLANA_CONFIG_DIR"/bootstrap-validator/
 $solana_keygen new --no-passphrase -so "$SOLANA_CONFIG_DIR"/bootstrap-validator/storage-keypair.json
 
 args=("$@")
-default_arg --enable-warmup-epochs
+default_arg --slots-per-epoch 432000 # Bad
+#default_arg --slots-per-epoch 8192  # Good
 default_arg --bootstrap-validator-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/identity-keypair.json
 default_arg --bootstrap-vote-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/vote-keypair.json
 default_arg --bootstrap-stake-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/stake-keypair.json
@@ -35,6 +36,6 @@ default_arg --bootstrap-storage-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/
 default_arg --ledger "$SOLANA_CONFIG_DIR"/bootstrap-validator
 default_arg --faucet-pubkey "$SOLANA_CONFIG_DIR"/faucet-keypair.json
 default_arg --faucet-lamports 500000000000000000
-default_arg --hashes-per-tick auto
+default_arg --hashes-per-tick sleep
 default_arg --operating-mode development
 $solana_genesis "${args[@]}"
  1. Run ./multinode-demo/setup.sh && ./multinode-demo/bootstrap-validator.sh

You can easily see from standard output that slots are passing by very slowly. But another way to view the problem after the bootstrap-validator starts up is by running cargo run --bin solana -- live-slots

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants