Skip to content

discover: adds mutex and lock/unlock logic to prevent race condition #31575

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

r4f4ss
Copy link

@r4f4ss r4f4ss commented Apr 6, 2025

Fix #31460

@r4f4ss r4f4ss requested review from fjl and zsfelfoldi as code owners April 6, 2025 02:04
@fjl
Copy link
Contributor

fjl commented Apr 7, 2025

The reported race is valid, but the fix is not good. The root of the issue is that handleAddNode is called from loadSeedNodes. This function runs in two contexts:

  • in Table constructor
  • during refresh, on a separate goroutine

In the latter case, it would be more appropriate to call addNode, which will correctly dispatch the node addition request to the Table.loop goroutine. All other access to the tableRevalidation data structure happens from this goroutine, and it was always intended to be used without a lock.

@r4f4ss
Copy link
Author

r4f4ss commented Apr 7, 2025

The reported race is valid, but the fix is not good. The root of the issue is that handleAddNode is called from loadSeedNodes. This function runs in two contexts:

  • in Table constructor
  • during refresh, on a separate goroutine

In the latter case, it would be more appropriate to call addNode, which will correctly dispatch the node addition request to the Table.loop goroutine. All other access to the tableRevalidation data structure happens from this goroutine, and it was always intended to be used without a lock.

Thanks for the clarification.

It seems that Table.doRefresh calls Table.loadSeedNodes which uses a lock to call Table.handleAddNode. The cause seems to be the call of tableRevalidation.run inside Table.loop, which doesn't use any lock.

I think using the Table lock to call revalidationList.schedule inside tableRevalidation.run solves it. Note that the Table lock is already used elsewhere in tableRevalidation.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race in p2p/discover/table_real.go
2 participants