Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

A61 update: fix ring_hash proactive connection attempt logic #475

Merged
merged 3 commits into from
Feb 14, 2025

Conversation

markdroth
Copy link
Member

Fix logic for ring_hash proactive connection attempts to trigger even if the state update was not triggered by an endpoint entering TRANSIENT_FAILURE state. This addresses two edge cases spotted by @arjan-bal:

  1. There are four endpoints in the following states: TF, TF, READY, and IDLE. If the READY endpoint fails, it transitions to IDLE, resulting in the new states: TF, TF, IDLE, IDLE.
  2. There are four endpoints in the following states: TF, TF, CONNECTING, and IDLE. If the CONNECTING endpoint is removed, the new states become: TF, TF, IDLE.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think talking so much about priority makes things overly complicated. CONNECTING has always implied "the LB policy is connecting." And since we adopted sticky-TF everywhere, TF also implies "the LB policy is connecting (with backoff)". But I understand why priority is being discussed; it just is unfortunate it is being discussed so much.

@markdroth markdroth merged commit 188d3d7 into grpc:master Feb 14, 2025
1 check passed
@markdroth markdroth deleted the A61_ring_hash_update branch February 14, 2025 22:49
copybara-service bot pushed a commit to grpc/grpc that referenced this pull request Feb 18, 2025
See grpc/proposal#475 for details.

While I was at it, I greatly simplified the logic, avoiding an unnecessary duplicate loop over all endpoints that we didn't need to be doing.

I removed an unnecessary end2end test that was broken by this change.  That test actually should have broken when I first changed ring_hash to delegate to pick_first back in #34244, and I'm not sure why it didn't.  But it definitely broke here when I changed the logic to not choose endpoints in any specific order when triggering connection attempts, and since the test was really intended to test a behavior that we no longer guarantee, I removed it.

In its place, I added a unit test that shows the new behavior in a much clearer way.  (Originally, we didn't have a good way to write unit tests for LB policies, so all of our tests were written as e2e tests.  Now that we do have a good framework for unit tests, we've been slowly finding e2e tests that really should be unit tests instead, and moving them over when we find them.)

Closes #38741

COPYBARA_INTEGRATE_REVIEW=#38741 from markdroth:ring_hash_tf_and_connecting_fix 6c4c773
PiperOrigin-RevId: 728255272
paulosjca pushed a commit to paulosjca/grpc that referenced this pull request Feb 20, 2025
See grpc/proposal#475 for details.

While I was at it, I greatly simplified the logic, avoiding an unnecessary duplicate loop over all endpoints that we didn't need to be doing.

I removed an unnecessary end2end test that was broken by this change.  That test actually should have broken when I first changed ring_hash to delegate to pick_first back in grpc#34244, and I'm not sure why it didn't.  But it definitely broke here when I changed the logic to not choose endpoints in any specific order when triggering connection attempts, and since the test was really intended to test a behavior that we no longer guarantee, I removed it.

In its place, I added a unit test that shows the new behavior in a much clearer way.  (Originally, we didn't have a good way to write unit tests for LB policies, so all of our tests were written as e2e tests.  Now that we do have a good framework for unit tests, we've been slowly finding e2e tests that really should be unit tests instead, and moving them over when we find them.)

Closes grpc#38741

COPYBARA_INTEGRATE_REVIEW=grpc#38741 from markdroth:ring_hash_tf_and_connecting_fix 6c4c773
PiperOrigin-RevId: 728255272
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants