Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

High volume of drained events -- xtrim can't keep up #69

Open
chrisvlopez opened this issue Mar 18, 2024 · 4 comments
Open

High volume of drained events -- xtrim can't keep up #69

chrisvlopez opened this issue Mar 18, 2024 · 4 comments

Comments

@chrisvlopez
Copy link

chrisvlopez commented Mar 18, 2024

Hi,

We've experienced an issue where our redis instance was backed up with a large number of drained events (O(millions)).

The streams.events.maxLen bit was not changed and I confirmed it was defaulting to 10K on redis itself. From testing locally, I think the root cause to be the redis xtrim command used to trim the events, specifically that there's a default limit on the number of events that will be trimmed:

When LIMIT and count aren't specified, the default value of 100 * the number of entries in a macro node will be implicitly used as the count.

Meaning, we were likely generating too many drain events for the trim event commands to keep up. A direct fix could involve:

Specifying the value 0 as count disables the limiting mechanism entirely.

which at least functionally works from my local testing (ignoring perf implications).

But there is an underlying question of whether we should be generating so many drain events in the first place. We have ~100 workers which I assume are each generating its own drain event that is causing the build up. Is there some setting we should be tuning to reduce the amount of event creation? Is increasing drainDelay the only option I have?

Thanks!

@manast
Copy link
Contributor

manast commented Mar 19, 2024

On the latest versions at least, the drained event is only emitted when a job is completed and there are no other jobs to fetch in the queue (delayed jobs will not be taken into consideration though). So an scenario were a lot of drained events are generated would be if jobs are coming in much slower than the workers process them such that every-time a job is processed the next job in the queue has not been added yet.
The logic to generate the drained event in older version was different though, so it would be interesting to know in which version you are experiencing this issue.

@chrisvlopez
Copy link
Author

chrisvlopez commented Mar 19, 2024

so it would be interesting to know in which version you are experiencing this issue.

we're on 6.4.0

I did see a similar comment re: changing the drain event logic. Let me bump to 6.6.1+ to see if that helps.

Confirm bumping to bullmq-pro @ 6.11.0 resolved it. I no longer get drain events on redis while idling.

@chrisvlopez
Copy link
Author

One other symptom I forgot to mention --

we also noticed an elevated amount of GET commands to redis. Would those be related to the drain events and similarly be fixed with the version bump?

@manast
Copy link
Contributor

manast commented Mar 19, 2024

@chrisvlopez never heard about elevated number of GET commands before, I would need a bit more context in order to be able to give an assessment.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants