Telemetry refactor #4026

pwojcikdev · 2022-12-21T22:45:14Z

This PR significantly simplifies the telemetry class. The previous implementation was quite complex, with deep nested callbacks, which led to some subtle and hard to debug bugs. Hopefully this implementation will be less susceptible to such behavior.

One major behavior change is that instead of only replying to telemetry requests, we now broadcast our own telemetry periodically. This can be taken further in subsequent node versions, removing the need for active telemetry requests will remove additional complexity from message handlers that now have to do additional checks just to handle telemetry replies.

For reviewing it's probably easiest to view the end result code instead of comparing diff, as majority of the class has changed.

# Conflicts: # nano/lib/stats.cpp # nano/lib/stats.hpp # nano/node/node.cpp # nano/node/telemetry.cpp # nano/slow_test/node.cpp

# Conflicts: # nano/node/node.cpp

# Conflicts: # nano/lib/stats.cpp # nano/lib/stats.hpp # nano/node/telemetry.cpp # nano/node/telemetry.hpp

dsiganos · 2023-01-27T06:27:32Z

This is not easy to review mainly because the previous design was messy and it is hard to know what changes with the new design. The new design seems simple and beautiful. It is hard to say if any problems will arise due to the change in behaviour but the changes seem compatible.

I wonder why we broadcast telemetries since we regularly request them anyway.

The RPC used to apparently wait for a telemetry to arrive whereas now we return immediately.
The previous behaviour seems bizarre and unnecessarily complicated but I wonder if any services depend on that behaviour. Probably not.

The frequency of telemetry requests and broadcasts seem excessive.

dsiganos · 2023-01-27T05:53:27Z

nano/core_test/telemetry.cpp

-	ASSERT_TIMELY (10s, 1 == node_server->stats.count (nano::stat::type::message, nano::stat::detail::telemetry_ack, nano::stat::dir::in));
-}
-
-namespace nano


Why is this test removed?

It looks like I forgot to do that check. Ideally the genesis blocks should be checked during handshake and prevent connection to mismatched peers altogether, but for now doing it in telemetry seems like a must have workaround.

nano/core_test/telemetry.cpp

nano/lib/config.hpp

nano/node/telemetry.hpp

nano/node/telemetry.cpp

# Conflicts: # nano/core_test/network.cpp

pwojcikdev · 2023-02-01T15:24:56Z

I wonder why we broadcast telemetries since we regularly request them anyway.

You can ask the same question in reverse, why request telemetries if they are broadcasted regularly anyway? Handling telemetry requests introduces some complications (caching local telemetry, rate limiting) that are not present if we simply periodically broadcast telemetry to all peers. For now we need to do both because it's a transition period, but long term moving to broadcast only mode will simplify things.

dsiganos · 2023-02-01T16:47:43Z

OK, broadcast-only makes sense.

pwojcikdev added 10 commits December 21, 2022 15:29

Simplify telemetry

4508e7f

Fix tests

e710a34

Cleanup config

8a53976

Cleanup local telemetry

6915664

Remove unused flag

5e170a6

Fix slow tests

7eff848

Fix rpc tests

c0aafc6

Cleanup nano::test::compare_telemetry

dac57d8

Add more testcases

a52949c

Add ongoing telemetry broadcasts

2650cc0

pwojcikdev force-pushed the telemetry-refactor branch from 0ae5ba8 to 2650cc0 Compare December 21, 2022 22:55

pwojcikdev and others added 4 commits December 22, 2022 02:51

Cleanup

25a3ea5

Merge branch 'develop' into telemetry-refactor

7f20cc6

# Conflicts: # nano/lib/stats.cpp # nano/lib/stats.hpp # nano/node/node.cpp # nano/node/telemetry.cpp # nano/slow_test/node.cpp

Merge branch 'develop' into telemetry-refactor

2e08804

Merge branch 'develop' into telemetry-refactor

f5226a5

# Conflicts: # nano/node/node.cpp

pwojcikdev requested review from dsiganos and clemahieu January 25, 2023 16:13

pwojcikdev added 2 commits January 25, 2023 17:19

Fixes

381f474

Merge branch 'develop' into telemetry-refactor

19b4a86

# Conflicts: # nano/lib/stats.cpp # nano/lib/stats.hpp # nano/node/telemetry.cpp # nano/node/telemetry.hpp

dsiganos reviewed Jan 27, 2023

View reviewed changes

pwojcikdev added 6 commits January 30, 2023 23:10

Merge branch 'develop' into telemetry-refactor

16963e8

# Conflicts: # nano/core_test/network.cpp

Do not immediately remove telemetry from disconnected peers

26e93fe

Increase telemetry broadcast & request intervals

5cf76bd

Update docs

0d3165b

Refactor peer_exclusion a bit

5b6b54f

Filter and disconnect from peers with mismatched genesis

8a24a49

pwojcikdev force-pushed the telemetry-refactor branch from 2468f30 to 8a24a49 Compare January 31, 2023 20:54

Merge branch 'develop' into telemetry-refactor

e085b20

Merge branch 'develop' into telemetry-refactor

a479e70

clemahieu approved these changes Feb 2, 2023

View reviewed changes

pwojcikdev merged commit baabcca into nanocurrency:develop Feb 2, 2023

thsfs added enhancement unit test Related to a new, changed or fixed unit test labels Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telemetry refactor #4026

Telemetry refactor #4026

pwojcikdev commented Dec 21, 2022

dsiganos commented Jan 27, 2023

dsiganos Jan 27, 2023

pwojcikdev Jan 30, 2023 •

edited

Loading

pwojcikdev commented Feb 1, 2023 •

edited

Loading

dsiganos commented Feb 1, 2023

Telemetry refactor #4026

Telemetry refactor #4026

Conversation

pwojcikdev commented Dec 21, 2022

dsiganos commented Jan 27, 2023

dsiganos Jan 27, 2023

Choose a reason for hiding this comment

pwojcikdev Jan 30, 2023 • edited Loading

Choose a reason for hiding this comment

pwojcikdev commented Feb 1, 2023 • edited Loading

dsiganos commented Feb 1, 2023

pwojcikdev Jan 30, 2023 •

edited

Loading

pwojcikdev commented Feb 1, 2023 •

edited

Loading