Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

xds/xdsclient: Fix flaky test TestLRSClient #7559

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Aug 24, 2024

Fixes: #7241

Background

The test is sometimes getting a second load report before the client closes the stream. The test is setting the load report interval to 50 ms here:

Resp: &v3lrspb.LoadStatsResponse{
SendAllClusters: true,
LoadReportingInterval: &durationpb.Duration{Nanos: 50000000},
},

In the final assertion of the test, the test is checking that the very next request present in the fake server's request channel is the client cancellation:

if u, err := fs2.LRSRequestChan.Receive(ctx); err != nil || status.Code(u.(*fakeserver.Request).Err) != codes.Canceled {
t.Errorf("unexpected LRS request: %v, %v, want error canceled", u, err)
}

When the the test takes more than 50 ms b/w sending the first LRS response with the load report interval and this final assertion, another load report can be sent by the client here:

func (t *Transport) sendLoads(ctx context.Context, stream lrsStream, clusterNames []string, interval time.Duration) {
tick := time.NewTicker(interval)
defer tick.Stop()
for {
select {
case <-tick.C:
case <-ctx.Done():
return
}
if err := t.sendLoadStatsRequest(stream, t.lrsStore.Stats(clusterNames)); err != nil {
t.logger.Warningf("Writing to LRS stream failed: %v", err)
return
}
}
}

Repro

To make the test fail with a similar error message (loadreport_test.go:139: unexpected LRS request: &{ <nil>}, <nil>, want error canceled) add a time.Sleep(100 * time.Millisecond) just after receiving the first LRS request on the server here:

u, err := fs2.LRSRequestChan.Receive(ctx)
if err != nil {
t.Fatalf("unexpected LRS request: %v, %v, want error canceled", u, err)
}

Fix

This PR keeps reading from fs2.LRSRequestChan.Receive until we get an error due to context expiration or see the expected client context cancellation response.

RELEASE NOTES: None

@arjan-bal arjan-bal added this to the 1.67 Release milestone Aug 24, 2024
Copy link

codecov bot commented Aug 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.83%. Comparing base (b45fc41) to head (7c0505d).
Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7559      +/-   ##
==========================================
- Coverage   82.07%   81.83%   -0.25%     
==========================================
  Files         360      361       +1     
  Lines       27533    27816     +283     
==========================================
+ Hits        22599    22764     +165     
- Misses       3759     3855      +96     
- Partials     1175     1197      +22     

see 47 files with indirect coverage changes

@arvindbr8 arvindbr8 assigned arjan-bal and unassigned easwars Aug 26, 2024
@arjan-bal arjan-bal merged commit 6d97688 into grpc:master Aug 26, 2024
14 checks passed
@arjan-bal arjan-bal deleted the grpc-go-flaky-test branch August 26, 2024 16:47
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: 6/100K: Test/LRSClient
3 participants