-
Notifications
You must be signed in to change notification settings - Fork 23
Getting schedules is very slow on Redis instances with many keys #78
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
First of all, thanks for your kind words! Yes, it looks like the current Redis source might be quite inefficient for this use case. I have a few ideas on how to improve it. One option is to create a custom schedule source using postgres or any other relational database. This would allow you to efficiently query all rows for all pending tasks. Regarding Redis, I think we can implement the following logic in an upcoming release: All schedules will be stored as arrays. There will be a dedicated array for cron jobs like I don't really know yet how to handle tasks scheduled for the past, but I'll figure it out. Since it's a critical issue, I'll try implementing it this week. Thanks again for your feedback! Also, if you like taskiq, you can help this project by sponsoring it. |
Thanks for your quick response and prioritizing this!
This makes a lot of sense to me and is definitely better performance than my suggestion. My only flag here is that this will be a breaking change--if someone upgrades then the new code won't be able to read existing stored schedules. Not a problem for us since we just eliminated our usage of it for now, but just want to make sure that's on your radar (it probably already was).
Sure, happy to contribute financially rather than with code 😁 Hello Patient is now a sponsor through the petsinc github org. |
I have an idea of how we can make a graceful upgrade for everybody. I'll try to pull it out in upcoming PR. |
Please check out the https://github.com/taskiq-python/taskiq-redis/releases/tag/1.0.4 . I created a Most importantly it has a function to migrate from one scheduler to another (more details in updated README). Since this function appeared to be quite generic we might include it in the |
Also, currently I'm gathering list of companies that use taskiq. I would really appreciate it if you could share a few words about your experience with TaskIQ and grant me permission to include the "Hello Patient" in our list of our users. Additionally, if you have any logos or a preferred way of showcasing your company on our website, please let me know — I’d be happy to represent you the way you want. |
Wow thank you for the quick fix, really great idea to introduce a second class and offer a migration function. We can probably deprecate the old class as well? I'm not sure if there is any advantage to using it. Yes, I'd be happy to have Hello Patient listed as a TaskIQ user. Let me follow up with our designer to get the best logo file for you and see if she has opinions about how we should be represented. As for our experience, I had previously used Celery to manage distributed task queues in Python but we are building a 100% async backend for Hello Patient and Celery does not have good support for async. TaskIQ was seamless to integrate, and is even easier to use than Celery. It has better typing support and a simpler API in addition to being fully async for both dispatch and execution. It's been performing very well in production handling tens of thousands of tasks a day with no issues. |
Yes, true. I will do it in upcoming release. Seems like new source works much better than previous. I have another company that has integrated new source and reported no errors so far. |
Great, glad to hear it! I got some feedback from our designer, I'll send it over to the email on your profile and also connect you to her in case you need to follow up on anything. |
First off, Taskiq is a great library! Thanks so much for bringing async to distributed task queues in Python. We're running this in production and it's doing well generally.
Overview
RedisScheduleSource.get_schedules
becomes very slow with a large number of keys (2-3 minutes with real-world usage over the network for a full scan of 600k keys).Context
We have minutely scheduled tasks (via the
LabelScheduleSource
and were seeing them fired every 2-3 mins. We had theRedisScheduleSource
configured as a second source. I identified that theRedisScheduleSource
was the cause of the issue due to the use of the scan_iter command and its default page size of 10 (search for default count on that page), therefore requiring 60k requests to Redis. Our monitoring confirms that assessment, this is one day of usage.We ended up just removing the
RedisScheduleSource
and the issue went away, we had misconfigured it and thought it was necessary for the decorator crons to work. Now I understand it is for dynamic scheduling, which we're not using yet.Proposed solution
Add configurability for the number of keys in each scan page. This should allow fewer network hops and therefore much better performance for high numbers of small keys. I'm happy to put up a PR for this if you're accepting contributions!
Initial testing
Here's the results from a quick test of 0 keys versus 500k keys with 10 character values on a Mac Mini M4 Pro with Redis running in docker. Results would of course be more extreme over an external network but it's already a very substantial difference.
Existing logic
Passing
count=1000
toscan_iter
Scripts used for reference
The text was updated successfully, but these errors were encountered: