Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use RateBurst in RateLimitInterceptorProvider (#4933)
<!-- Describe what has changed in this PR --> **What changed?** This PR changes the behavior of `RateLimitInterceptorProvider`. Instead of providing rate limiters to `NewRequestToRateLimiter`, we provide `RateBurst` functions. <!-- Tell your future self why have you made these changes --> **Why?** Because this was causing a bug: _Impact:_ These dynamic configs are never updated after the server starts: - `frontend.rps` - `frontend.globalRPS` - `frontend.rps.namespaceReplicationInducingAPIs` Additionally, frontend.globalRPS is rendered completely unusable because it calculates the number of members to be 0 on startup, so it defaults to frontend.rps to avoid a division-by-zero when calculating the effective per-isntance limit: _Root cause:_ We construct a quota calculator [here](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L305) . Then, we convert that to a RateFn and make 3 different DynamicRateLimiterImpl objects from it. We then pass all of those objects to NewRequestToRateLimiter. However, that function actually just accepts RateBurst objects, not RateLimiters. This is important because, since they’re just RateBurst objects, only their Rate and Burst methods may be used. However, those are also the only two methods in DynamicRateLimiterImpl which don’t call maybeRefresh. As a result, none of [these rate limiters](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L314-L317) get updated dynamically. This bug was originally introduced in this PR [Add per namespace burst limit control by wxing1292 · Pull Request #2067 · temporalio/temporal](#2067), specifically this commit: [Add per namespace burst limit control (#2067) · af181b4](af181b4) . After that, it was repeated by two more configs which were added below it using the same strategy. So, for 2 years (since Oct 26, 2021), frontend.rps has not been updating dynamically. _Evidence:_ Here’s a commit which demonstrates the problem: [Demonstrate that maybeRefresh is never called for these dynamic rate … · 9578140](9578140) Solution: There are a couple of ways to fix it, but the most obvious is to simply not wrap the rate burst functions in a dynamic rate limiter to begin with because it’s actually doing the exact opposite. It’s taking what would be dynamically-updated rate limits and making them constant. <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> **How did you test it?** Testing this relies on time. To avoid a large refactor propagating clock.TimeSource, I wrote a test with sleeps in it. However, I don't want to commit that because it also requires us changing the refresh interval of the dynamic rate limiter. Instead, I'll just provide a link to the commit containing the changes here: 0e4b4c2 . This commit adds a test which verifies that, if we update the rps, the rate limiter itself also updates its rate. I also tested this without this change and verified that the test fails. <!-- Assuming the worst case, what can be broken when deploying this change to production? --> **Potential risks** <!-- Is this PR a hotfix candidate or require that a notification be sent to the broader community? (Yes/No) --> **Is hotfix candidate?**
- Loading branch information