Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Telemetry refactor #4026

Merged
merged 24 commits into from
Feb 2, 2023
Merged

Conversation

pwojcikdev
Copy link
Contributor

This PR significantly simplifies the telemetry class. The previous implementation was quite complex, with deep nested callbacks, which led to some subtle and hard to debug bugs. Hopefully this implementation will be less susceptible to such behavior.

One major behavior change is that instead of only replying to telemetry requests, we now broadcast our own telemetry periodically. This can be taken further in subsequent node versions, removing the need for active telemetry requests will remove additional complexity from message handlers that now have to do additional checks just to handle telemetry replies.

For reviewing it's probably easiest to view the end result code instead of comparing diff, as majority of the class has changed.

pwojcikdev and others added 4 commits December 22, 2022 02:51
# Conflicts:
#	nano/lib/stats.cpp
#	nano/lib/stats.hpp
#	nano/node/node.cpp
#	nano/node/telemetry.cpp
#	nano/slow_test/node.cpp
# Conflicts:
#	nano/node/node.cpp
# Conflicts:
#	nano/lib/stats.cpp
#	nano/lib/stats.hpp
#	nano/node/telemetry.cpp
#	nano/node/telemetry.hpp
@dsiganos
Copy link
Contributor

This is not easy to review mainly because the previous design was messy and it is hard to know what changes with the new design. The new design seems simple and beautiful. It is hard to say if any problems will arise due to the change in behaviour but the changes seem compatible.

I wonder why we broadcast telemetries since we regularly request them anyway.

The RPC used to apparently wait for a telemetry to arrive whereas now we return immediately.
The previous behaviour seems bizarre and unnecessarily complicated but I wonder if any services depend on that behaviour. Probably not.

The frequency of telemetry requests and broadcasts seem excessive.

ASSERT_TIMELY (10s, 1 == node_server->stats.count (nano::stat::type::message, nano::stat::detail::telemetry_ack, nano::stat::dir::in));
}

namespace nano
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test removed?

Copy link
Contributor Author

@pwojcikdev pwojcikdev Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I forgot to do that check. Ideally the genesis blocks should be checked during handshake and prevent connection to mismatched peers altogether, but for now doing it in telemetry seems like a must have workaround.

nano/core_test/telemetry.cpp Show resolved Hide resolved
nano/lib/config.hpp Outdated Show resolved Hide resolved
nano/lib/config.hpp Outdated Show resolved Hide resolved
nano/node/telemetry.hpp Show resolved Hide resolved
nano/node/telemetry.cpp Outdated Show resolved Hide resolved
@pwojcikdev
Copy link
Contributor Author

pwojcikdev commented Feb 1, 2023

I wonder why we broadcast telemetries since we regularly request them anyway.

You can ask the same question in reverse, why request telemetries if they are broadcasted regularly anyway? Handling telemetry requests introduces some complications (caching local telemetry, rate limiting) that are not present if we simply periodically broadcast telemetry to all peers. For now we need to do both because it's a transition period, but long term moving to broadcast only mode will simplify things.

@dsiganos
Copy link
Contributor

dsiganos commented Feb 1, 2023

OK, broadcast-only makes sense.

@pwojcikdev pwojcikdev merged commit baabcca into nanocurrency:develop Feb 2, 2023
@thsfs thsfs added enhancement unit test Related to a new, changed or fixed unit test labels Mar 2, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement unit test Related to a new, changed or fixed unit test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants