Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve consumer metric cleanup when a channel goes down #9356

Merged
merged 1 commit into from
Sep 9, 2023

Conversation

SimonUnge
Copy link
Member

@SimonUnge SimonUnge commented Sep 8, 2023

Proposed Changes

This update improves metric cleanup of consumer data when a channel goes down by:

  • No longer emitting consumer_deleted for each consumer of a channel
  • bulk/pattern deletion of metric data in ets tables.

See #9320 for background.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality) (not really)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.

@SimonUnge SimonUnge marked this pull request as ready for review September 8, 2023 22:37
@michaelklishin
Copy link
Member

With this PR, when I kill an app process that has 10K consumers on one channel, I observe a short spike of CPU usage from ≈ 0.3% to ≈ 5% which then instantly goes back to ≈ 0.3%.

@michaelklishin
Copy link
Member

michaelklishin commented Sep 9, 2023

Ah, the original test was with over 100K consumers on a channel. I re-tested with 200K consumers and CPU usage drops to 14% within a few seconds, and to virtually zero in the next 40s-60s, so this definitely provides an improvement over the behavior described in #9320.

michaelklishin added a commit that referenced this pull request Sep 9, 2023
Improve consumer metric cleanup when a channel goes down (backport #9356)
@michaelklishin
Copy link
Member

This improves sample cleanup for a few channels with an extremely large number of consumers but regresses when there's a large number of channels with one consumer, which is more common.

Passing consumer info as context when a channel is closed would help avoid the scan introduced in this PR entirely.

@michaelklishin
Copy link
Member

Or, alternatively, the two approaches can be combined:

  • Do a scan when the number of consumers is > 1000 (or another constant)
  • Include consumer details when there are just a few

@SimonUnge
Copy link
Member Author

To clarify, its ets:match_delete that is the culprit then, and should be avoided if there are few consumers in the channel. Do we see a need to also revert the removal of emit_consumer_delete in rabbit_amqqueue_process, i.e add back the event notification?

michaelklishin added a commit that referenced this pull request Feb 2, 2024
michaelklishin added a commit that referenced this pull request Feb 2, 2024
mergify bot pushed a commit that referenced this pull request Feb 3, 2024
(cherry picked from commit dad9379)
michaelklishin added a commit that referenced this pull request Feb 29, 2024
acogoluegnes added a commit that referenced this pull request Jan 17, 2025
Not when the channel or the connection is closed.

References #13085, #9356
acogoluegnes added a commit that referenced this pull request Jan 17, 2025
Not when the channel or the connection is closed.

References #13085, #9356
mergify bot pushed a commit that referenced this pull request Jan 17, 2025
Not when the channel or the connection is closed.

References #13085, #9356

(cherry picked from commit 69d0382)

# Conflicts:
#	deps/rabbitmq_stream/src/rabbit_stream_reader.erl
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants