Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add per namespace burst limit control #2067

Merged
merged 6 commits into from
Oct 27, 2021
Merged

Add per namespace burst limit control #2067

merged 6 commits into from
Oct 27, 2021

Conversation

wxing1292
Copy link
Contributor

What changed?

  • Add per namespace burst limit control
  • Default per namespace burst limit to be 4800, 2x of rate limit
  • Update logic accordingly

Why?
More configs

How did you test it?
Existing tests

Potential risks
N/A

Is hotfix candidate?
No

* Add per namespace burst limit control
* Default per namespace burst limit to be 4800, 2x of rate limit
* Update logic accordingly
@yiminc yiminc enabled auto-merge (squash) October 18, 2021 23:21
@yiminc yiminc disabled auto-merge October 18, 2021 23:45
common/quotas/dynamic.go Outdated Show resolved Hide resolved
@@ -32,66 +32,116 @@ type (
// RateFn returns a float64 as the RPS
RateFn func() float64

// BurstFn returns a int as the RPS
// BurstFn returns an int as the burst / bucket size
BurstFn func() int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these 2 fn, since we are using interface to define the methods.

common/quotas/dynamic.go Show resolved Hide resolved
Load() int
Store(burst int)
BurstFn() BurstFn
DynamicRateBurst interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to MutableRateLimiter?

common/quotas/dynamic.go Outdated Show resolved Hide resolved
common/quotas/dynamic.go Outdated Show resolved Hide resolved
// for incoming traffic
func NewDefaultIncomingDynamicRateLimiter(
func NewDefaultIncomingRateLimiter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove Default from name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

// for outgoing traffic
func NewDefaultOutgoingDynamicRateLimiter(
func NewDefaultOutgoingRateLimiter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove Default from name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default indicate the ratio, either butst = 1 * rate * 1 second or butst = 2 * rate * 1 second

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name let me think that this creates a default implementation of RateLimiter for outgoing calls. The name should be NewOutgoingRateLimiterWithDefaultRatio. I'm not suggesting that you have to change this name in this PR, just state what confuses me as a reader.

service/frontend/configs/quotas.go Outdated Show resolved Hide resolved
service/frontend/configs/quotas.go Outdated Show resolved Hide resolved
Copy link
Member

@yiminc yiminc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the DynamicRateBurst should be MutableRateBurst. But it is your call.


return &DynamicBurstImpl{
func NewDynamicRateBurst(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic in the name really give impression that this is based on dynamic config, but it is just like mutable that can be set by code. There is another NewDynamicRateLimiter() which as the name suggested is based on dynamic config. Wouldn't MutableRateBurst more appropriate here?

// for outgoing traffic
func NewDefaultOutgoingDynamicRateLimiter(
func NewDefaultOutgoingRateLimiter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name let me think that this creates a default implementation of RateLimiter for outgoing calls. The name should be NewOutgoingRateLimiterWithDefaultRatio. I'm not suggesting that you have to change this name in this PR, just state what confuses me as a reader.

@wxing1292 wxing1292 enabled auto-merge (squash) October 27, 2021 01:12
@wxing1292 wxing1292 merged commit af181b4 into temporalio:master Oct 27, 2021
@wxing1292 wxing1292 deleted the burst-control branch October 27, 2021 01:37
MichaelSnowden added a commit that referenced this pull request Oct 4, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
This PR changes the behavior of `RateLimitInterceptorProvider`. Instead
of providing rate limiters to `NewRequestToRateLimiter`, we provide
`RateBurst` functions.

<!-- Tell your future self why have you made these changes -->
**Why?**
Because this was causing a bug:

_Impact:_
These dynamic configs are never updated after the server starts:
- `frontend.rps`
- `frontend.globalRPS`
- `frontend.rps.namespaceReplicationInducingAPIs`

Additionally, frontend.globalRPS is rendered completely unusable because
it calculates the number of members to be 0 on startup, so it defaults
to frontend.rps to avoid a division-by-zero when calculating the
effective per-isntance limit:

_Root cause:_
We construct a quota calculator
[here](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L305)
. Then, we convert that to a RateFn and make 3 different
DynamicRateLimiterImpl objects from it. We then pass all of those
objects to NewRequestToRateLimiter. However, that function actually just
accepts RateBurst objects, not RateLimiters. This is important because,
since they’re just RateBurst objects, only their Rate and Burst methods
may be used. However, those are also the only two methods in
DynamicRateLimiterImpl which don’t call maybeRefresh. As a result, none
of [these rate
limiters](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L314-L317)
get updated dynamically.
This bug was originally introduced in this PR [Add per namespace burst
limit control by wxing1292 · Pull Request #2067 ·
temporalio/temporal](#2067),
specifically this commit: [Add per namespace burst limit control (#2067)
·
af181b4](af181b4)
. After that, it was repeated by two more configs which were added below
it using the same strategy. So, for 2 years (since Oct 26, 2021),
frontend.rps has not been updating dynamically.

_Evidence:_
Here’s a commit which demonstrates the problem: [Demonstrate that
maybeRefresh is never called for these dynamic rate … ·
9578140](9578140)
Solution:
There are a couple of ways to fix it, but the most obvious is to simply
not wrap the rate burst functions in a dynamic rate limiter to begin
with because it’s actually doing the exact opposite. It’s taking what
would be dynamically-updated rate limits and making them constant.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Testing this relies on time. To avoid a large refactor propagating
clock.TimeSource, I wrote a test with sleeps in it. However, I don't
want to commit that because it also requires us changing the refresh
interval of the dynamic rate limiter. Instead, I'll just provide a link
to the commit containing the changes here:
0e4b4c2
. This commit adds a test which verifies that, if we update the rps, the
rate limiter itself also updates its rate. I also tested this without
this change and verified that the test fails.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
rodrigozhou pushed a commit that referenced this pull request Oct 6, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
This PR changes the behavior of `RateLimitInterceptorProvider`. Instead
of providing rate limiters to `NewRequestToRateLimiter`, we provide
`RateBurst` functions.

<!-- Tell your future self why have you made these changes -->
**Why?**
Because this was causing a bug:

_Impact:_
These dynamic configs are never updated after the server starts:
- `frontend.rps`
- `frontend.globalRPS`
- `frontend.rps.namespaceReplicationInducingAPIs`

Additionally, frontend.globalRPS is rendered completely unusable because
it calculates the number of members to be 0 on startup, so it defaults
to frontend.rps to avoid a division-by-zero when calculating the
effective per-isntance limit:

_Root cause:_
We construct a quota calculator
[here](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L305)
. Then, we convert that to a RateFn and make 3 different
DynamicRateLimiterImpl objects from it. We then pass all of those
objects to NewRequestToRateLimiter. However, that function actually just
accepts RateBurst objects, not RateLimiters. This is important because,
since they’re just RateBurst objects, only their Rate and Burst methods
may be used. However, those are also the only two methods in
DynamicRateLimiterImpl which don’t call maybeRefresh. As a result, none
of [these rate
limiters](https://github.com/temporalio/temporal/blob/c7847968b309eb71edbe1da6f6cfa4d1c0b21341/service/frontend/fx.go#L314-L317)
get updated dynamically.
This bug was originally introduced in this PR [Add per namespace burst
limit control by wxing1292 · Pull Request #2067 ·
temporalio/temporal](#2067),
specifically this commit: [Add per namespace burst limit control (#2067)
·
af181b4](af181b4)
. After that, it was repeated by two more configs which were added below
it using the same strategy. So, for 2 years (since Oct 26, 2021),
frontend.rps has not been updating dynamically.

_Evidence:_
Here’s a commit which demonstrates the problem: [Demonstrate that
maybeRefresh is never called for these dynamic rate … ·
9578140](9578140)
Solution:
There are a couple of ways to fix it, but the most obvious is to simply
not wrap the rate burst functions in a dynamic rate limiter to begin
with because it’s actually doing the exact opposite. It’s taking what
would be dynamically-updated rate limits and making them constant.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Testing this relies on time. To avoid a large refactor propagating
clock.TimeSource, I wrote a test with sleeps in it. However, I don't
want to commit that because it also requires us changing the refresh
interval of the dynamic rate limiter. Instead, I'll just provide a link
to the commit containing the changes here:
0e4b4c2
. This commit adds a test which verifies that, if we update the rps, the
rate limiter itself also updates its rate. I also tested this without
this change and verified that the test fails.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants