From 1f01c72a32bea24cd35e3c0f0184d321df050f9c Mon Sep 17 00:00:00 2001 From: Michael Snowden Date: Wed, 4 Oct 2023 09:07:15 -0700 Subject: [PATCH] Use RateBurst in RateLimitInterceptorProvider (#4933) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **What changed?** This PR changes the behavior of `RateLimitInterceptorProvider`. Instead of providing rate limiters to `NewRequestToRateLimiter`, we provide `RateBurst` functions. **Why?** Because this was causing a bug: _Impact:_ These dynamic configs are never updated after the server starts: - `frontend.rps` - `frontend.globalRPS` - `frontend.rps.namespaceReplicationInducingAPIs` Additionally, frontend.globalRPS is rendered completely unusable because it calculates the number of members to be 0 on startup, so it defaults to frontend.rps to avoid a division-by-zero when calculating the effective per-isntance limit: _Root cause:_ We construct a quota calculator [here](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L305) . Then, we convert that to a RateFn and make 3 different DynamicRateLimiterImpl objects from it. We then pass all of those objects to NewRequestToRateLimiter. However, that function actually just accepts RateBurst objects, not RateLimiters. This is important because, since they’re just RateBurst objects, only their Rate and Burst methods may be used. However, those are also the only two methods in DynamicRateLimiterImpl which don’t call maybeRefresh. As a result, none of [these rate limiters](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L314-L317) get updated dynamically. This bug was originally introduced in this PR [Add per namespace burst limit control by wxing1292 · Pull Request #2067 · temporalio/temporal](https://github.com/temporalio/temporal/pull/2067), specifically this commit: [Add per namespace burst limit control (#2067) · temporalio/temporal@af181b4](https://github.com/temporalio/temporal/commit/af181b40486bd596f7458049441825ce6c106552) . After that, it was repeated by two more configs which were added below it using the same strategy. So, for 2 years (since Oct 26, 2021), frontend.rps has not been updating dynamically. _Evidence:_ Here’s a commit which demonstrates the problem: [Demonstrate that maybeRefresh is never called for these dynamic rate … · temporalio/temporal@9578140](https://github.com/temporalio/temporal/commit/9578140c5f049869ac907d73920db289af6de587) Solution: There are a couple of ways to fix it, but the most obvious is to simply not wrap the rate burst functions in a dynamic rate limiter to begin with because it’s actually doing the exact opposite. It’s taking what would be dynamically-updated rate limits and making them constant. **How did you test it?** Testing this relies on time. To avoid a large refactor propagating clock.TimeSource, I wrote a test with sleeps in it. However, I don't want to commit that because it also requires us changing the refresh interval of the dynamic rate limiter. Instead, I'll just provide a link to the commit containing the changes here: https://github.com/temporalio/temporal/commit/0e4b4c2ac0075a43c433712e3d3c372fd6e283c8 . This commit adds a test which verifies that, if we update the rps, the rate limiter itself also updates its rate. I also tested this without this change and verified that the test fails. **Potential risks** **Is hotfix candidate?** --- service/frontend/fx.go | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/service/frontend/fx.go b/service/frontend/fx.go index d85fafb7a83..2125934d1dd 100644 --- a/service/frontend/fx.go +++ b/service/frontend/fx.go @@ -311,10 +311,10 @@ func RateLimitInterceptorProvider( return interceptor.NewRateLimitInterceptor( configs.NewRequestToRateLimiter( - quotas.NewDefaultIncomingRateLimiter(rateFn), - quotas.NewDefaultIncomingRateLimiter(rateFn), - quotas.NewDefaultIncomingRateLimiter(namespaceReplicationInducingRateFn), - quotas.NewDefaultIncomingRateLimiter(rateFn), + quotas.NewDefaultIncomingRateBurst(rateFn), + quotas.NewDefaultIncomingRateBurst(rateFn), + quotas.NewDefaultIncomingRateBurst(namespaceReplicationInducingRateFn), + quotas.NewDefaultIncomingRateBurst(rateFn), serviceConfig.OperatorRPSRatio, ), map[string]int{},