-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Flaky test package: test/xds #6914
Comments
another one for TestServerSideXDS_WithValidAndInvalidSecurityConfiguration: https://github.com/grpc/grpc-go/actions/runs/7480796959/job/20361025267?pr=6916 |
@zasweq I investigated this and the problem seems to be due to the xDS management server getting stuck while writing to this buffered channel
In the logs of failing runs for This seems to be a problem with the test and not the implementation. Adding a 50 millis sleep after starting both the servers did get rid of the flakiness in |
Ah nice thank you for figuring this out! |
You mentioned this solved the test, but not the flakes in the full package. This was my flaky test in this PR so thanks for fixing this: https://github.com/grpc/grpc-go/actions/runs/10050840269/job/27779434995?pr=7434 :). |
Closing this issue as the root cause of all flaking tests seems to be different. There are individual issues open for tests that have flaked recently. |
Alongside #6913 and #6912, I have ran the test/xds suite on master since I added tests to it for my xDS Server fix #6889. I have encountered numerous flakes on g3, particularly those outlined in custom lb tests for distribution #6601. However, I have encountered almost every client and server side xDS test flake with a context timeout for a RPC expected to proceed. Each has different logs/events preceeding it's timeout, but every test seems susceptible to timeout. The flakes are generally rare, but due to the number of tests in the test suite you can successfully trigger by running the full test suite enough times. My initial inkling tells me there's some synchronization needed or something gets stuck in the management server/testing xDS Client flow. This also manifests in rare flakes for my xDS Server fix, where I expect something like an err that represents Accept and Close, and I get a context timeout instead.
The text was updated successfully, but these errors were encountered: