Skip to content

Commit

Permalink
Use RateBurst in RateLimitInterceptorProvider (#4933)
Browse files Browse the repository at this point in the history
<!-- Describe what has changed in this PR -->
**What changed?**
This PR changes the behavior of `RateLimitInterceptorProvider`. Instead
of providing rate limiters to `NewRequestToRateLimiter`, we provide
`RateBurst` functions.

<!-- Tell your future self why have you made these changes -->
**Why?**
Because this was causing a bug:

_Impact:_
These dynamic configs are never updated after the server starts:
- `frontend.rps`
- `frontend.globalRPS`
- `frontend.rps.namespaceReplicationInducingAPIs`

Additionally, frontend.globalRPS is rendered completely unusable because
it calculates the number of members to be 0 on startup, so it defaults
to frontend.rps to avoid a division-by-zero when calculating the
effective per-isntance limit:

_Root cause:_
We construct a quota calculator
[here](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L305)
. Then, we convert that to a RateFn and make 3 different
DynamicRateLimiterImpl objects from it. We then pass all of those
objects to NewRequestToRateLimiter. However, that function actually just
accepts RateBurst objects, not RateLimiters. This is important because,
since they’re just RateBurst objects, only their Rate and Burst methods
may be used. However, those are also the only two methods in
DynamicRateLimiterImpl which don’t call maybeRefresh. As a result, none
of [these rate
limiters](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L314-L317)
get updated dynamically.
This bug was originally introduced in this PR [Add per namespace burst
limit control by wxing1292 · Pull Request #2067 ·
temporalio/temporal](#2067),
specifically this commit: [Add per namespace burst limit control (#2067)
·
af181b4](af181b4)
. After that, it was repeated by two more configs which were added below
it using the same strategy. So, for 2 years (since Oct 26, 2021),
frontend.rps has not been updating dynamically.

_Evidence:_
Here’s a commit which demonstrates the problem: [Demonstrate that
maybeRefresh is never called for these dynamic rate … ·
9578140](9578140)
Solution:
There are a couple of ways to fix it, but the most obvious is to simply
not wrap the rate burst functions in a dynamic rate limiter to begin
with because it’s actually doing the exact opposite. It’s taking what
would be dynamically-updated rate limits and making them constant.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Testing this relies on time. To avoid a large refactor propagating
clock.TimeSource, I wrote a test with sleeps in it. However, I don't
want to commit that because it also requires us changing the refresh
interval of the dynamic rate limiter. Instead, I'll just provide a link
to the commit containing the changes here:
0e4b4c2
. This commit adds a test which verifies that, if we update the rps, the
rate limiter itself also updates its rate. I also tested this without
this change and verified that the test fails.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
  • Loading branch information
MichaelSnowden authored and rodrigozhou committed Oct 6, 2023
1 parent 9abaafd commit 1f01c72
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions service/frontend/fx.go
Original file line number Diff line number Diff line change
Expand Up @@ -311,10 +311,10 @@ func RateLimitInterceptorProvider(

return interceptor.NewRateLimitInterceptor(
configs.NewRequestToRateLimiter(
quotas.NewDefaultIncomingRateLimiter(rateFn),
quotas.NewDefaultIncomingRateLimiter(rateFn),
quotas.NewDefaultIncomingRateLimiter(namespaceReplicationInducingRateFn),
quotas.NewDefaultIncomingRateLimiter(rateFn),
quotas.NewDefaultIncomingRateBurst(rateFn),
quotas.NewDefaultIncomingRateBurst(rateFn),
quotas.NewDefaultIncomingRateBurst(namespaceReplicationInducingRateFn),
quotas.NewDefaultIncomingRateBurst(rateFn),
serviceConfig.OperatorRPSRatio,
),
map[string]int{},
Expand Down

0 comments on commit 1f01c72

Please # to comment.