-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
A61 update: fix ring_hash proactive connection attempt logic #475
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think talking so much about priority makes things overly complicated. CONNECTING has always implied "the LB policy is connecting." And since we adopted sticky-TF everywhere, TF also implies "the LB policy is connecting (with backoff)". But I understand why priority is being discussed; it just is unfortunate it is being discussed so much.
See grpc/proposal#475 for details. While I was at it, I greatly simplified the logic, avoiding an unnecessary duplicate loop over all endpoints that we didn't need to be doing. I removed an unnecessary end2end test that was broken by this change. That test actually should have broken when I first changed ring_hash to delegate to pick_first back in #34244, and I'm not sure why it didn't. But it definitely broke here when I changed the logic to not choose endpoints in any specific order when triggering connection attempts, and since the test was really intended to test a behavior that we no longer guarantee, I removed it. In its place, I added a unit test that shows the new behavior in a much clearer way. (Originally, we didn't have a good way to write unit tests for LB policies, so all of our tests were written as e2e tests. Now that we do have a good framework for unit tests, we've been slowly finding e2e tests that really should be unit tests instead, and moving them over when we find them.) Closes #38741 COPYBARA_INTEGRATE_REVIEW=#38741 from markdroth:ring_hash_tf_and_connecting_fix 6c4c773 PiperOrigin-RevId: 728255272
See grpc/proposal#475 for details. While I was at it, I greatly simplified the logic, avoiding an unnecessary duplicate loop over all endpoints that we didn't need to be doing. I removed an unnecessary end2end test that was broken by this change. That test actually should have broken when I first changed ring_hash to delegate to pick_first back in grpc#34244, and I'm not sure why it didn't. But it definitely broke here when I changed the logic to not choose endpoints in any specific order when triggering connection attempts, and since the test was really intended to test a behavior that we no longer guarantee, I removed it. In its place, I added a unit test that shows the new behavior in a much clearer way. (Originally, we didn't have a good way to write unit tests for LB policies, so all of our tests were written as e2e tests. Now that we do have a good framework for unit tests, we've been slowly finding e2e tests that really should be unit tests instead, and moving them over when we find them.) Closes grpc#38741 COPYBARA_INTEGRATE_REVIEW=grpc#38741 from markdroth:ring_hash_tf_and_connecting_fix 6c4c773 PiperOrigin-RevId: 728255272
Fix logic for ring_hash proactive connection attempts to trigger even if the state update was not triggered by an endpoint entering TRANSIENT_FAILURE state. This addresses two edge cases spotted by @arjan-bal: