Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[fix][broker] Fix the bug that elected leader thinks it's a follower #23138

Merged
merged 2 commits into from
Aug 8, 2024

Conversation

heesung-sn
Copy link
Contributor

Fixes #23125

Motivation

As reported in the issue, #23125, we have seen a case where the elected leader indefinitely thinks it's a follower. As a result, this issue can block load shedding(run by the leader) and cause broker load imbalance.

Additionally, I think the broker leadership should be checked more aggressively upon zk re-connection as well because znode deletion watch events could be missed while reconnected[1]

Watches are maintained locally at the ZooKeeper server to which the client is connected. This allows watches to be lightweight to set, maintain, and dispatch. When a client connects to a new server, the watch will be triggered for any session events. Watches will not be received while disconnected from a server. When a client reconnects, any previously registered watches will be reregistered and triggered if needed. In general this all occurs transparently. There is one case where a watch may be missed: a watch for the existence of a znode not yet created will be missed if the znode is created and deleted while disconnected.

Refs:
[1] https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkWatches

Modifications

  • Fix a bug to return Leading LeaderElectionState when the leader is the same as the proposed and from the same session.
  • elect() more aggressively upon zk reconnection events.
  • Set LeaderElectionState to NoLeader when closing LeaderElection.
  • Add more logs to better debug leader election

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: heesung-sn#77

@heesung-sn heesung-sn requested review from merlimat and lhotari August 7, 2024 17:49
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 7, 2024
@heesung-sn heesung-sn self-assigned this Aug 7, 2024
@heesung-sn heesung-sn changed the title [fix][broker] Fix the error that elected leader thinks it's a follower [fix][broker] Fix the bug that elected leader thinks it's a follower Aug 7, 2024
@lhotari lhotari added this to the 3.4.0 milestone Aug 7, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and good work @heesung-sn! LGTM

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.56%. Comparing base (bbc6224) to head (f797af3).
Report is 501 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23138      +/-   ##
============================================
+ Coverage     73.57%   74.56%   +0.99%     
- Complexity    32624    33576     +952     
============================================
  Files          1877     1919      +42     
  Lines        139502   144174    +4672     
  Branches      15299    15758     +459     
============================================
+ Hits         102638   107504    +4866     
+ Misses        28908    28436     -472     
- Partials       7956     8234     +278     
Flag Coverage Δ
inttests 27.75% <54.54%> (+3.17%) ⬆️
systests 24.75% <54.54%> (+0.42%) ⬆️
unittests 73.92% <100.00%> (+1.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...metadata/coordination/impl/LeaderElectionImpl.java 81.39% <100.00%> (+2.54%) ⬆️

... and 492 files with indirect coverage changes

@lhotari lhotari requested a review from poorbarcode August 8, 2024 08:18
@lhotari lhotari merged commit 3560ddb into apache:master Aug 8, 2024
51 checks passed
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 9, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 12, 2024
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Aug 28, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 16, 2025
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 16, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] The Elected Leader Broker Thinks It's a Follower
3 participants