-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Delayed dns resolve after upgrade to v1.60.0 #7186
Comments
I can see how #6668 could be the problem maker. I understand the problem, and the fix required as well. Is there a way to easily reproduce the problem though (without any complicated k8s setup)? That would be helpful to verify the fix. Thanks. |
Thanks for the response @easwars! Unfortunately I've not been able to reproduce this locally. All debugging have been done in our cloud based dev environment. I've done some additional testing and found that the problem still remains on master. However, I've been able to get this to work by checking out master and reverting the It seems like the issue is that the current implementation assigns the duration instead of starting the timer as the previous ( grpc-go/internal/resolver/dns/dns_resolver.go Lines 212 to 233 in e22436a
To try to prove my case I've added the following debug logs for master and my test branch to highlight the behavior:
As you can see in the logs for master we wait 30s upon receiving the Logs master:
Logs test branch:
|
@sebastianjonasson : You are correct in your assessment about when the timer is created in the DNS resolver's watcher goroutine. I was also going to fix it the same way you have attempted to fix it. But I wanted to have a test for it, which is why I asked you if you had an easy reproduction. Since you don't have an easy repro, I will work on a unit test for this scenario. Thank you for reporting this issue and all the information you have provided along the way. |
Great! Thanks for looking into this! |
NOTE: if you are reporting is a potential security vulnerability or a crash,
please follow our CVE process at
https://github.com/grpc/proposal/blob/master/P4-grpc-cve-process.md instead of
filing an issue here.
Please see the FAQ in our main README.md, then answer the questions below
before submitting your issue.
What version of gRPC are you using?
1.60.0
What version of Go are you using (
go version
)?1.22
What operating system (Linux, Windows, …) and version?
Ubuntu 22.04
What did you do?
Upgrade from v1.59.0 to v1.60.0 for grpc client service.
Clients now take 30 seconds to re-resolve upstream services on connection error (server shutting down).
What did you expect to see?
During deploy of upstream grpc server I expect to see no request failing due to delayed DNS resolve in the grpc client. New IP should instantly be resolved after first
connection refused error: transport
.Example (from v1.59.0):
What did you see instead?
Tested with
v1.60.0
andv1.63.2
.During deploy of upstream grpc server it's a 30 second delay between the first failing request until the new IP is resolved (see logs below).
I've tested this with both single and multiple instance setup for both grpc client and server. We've seen this error for services using a grpc client.
By testing each commit in
v1.60.0
I found that the patch from PR #6668 was the first commit that caused this behavior.Logs output from during deploy of upstream grpc server (
v1.60.0
):Common client setup for reference:
The text was updated successfully, but these errors were encountered: