Getting schedules is very slow on Redis instances with many keys #78

nickderobertis · 2025-03-30T22:38:40Z

First off, Taskiq is a great library! Thanks so much for bringing async to distributed task queues in Python. We're running this in production and it's doing well generally.

Overview

RedisScheduleSource.get_schedules becomes very slow with a large number of keys (2-3 minutes with real-world usage over the network for a full scan of 600k keys).

Context

We have minutely scheduled tasks (via the LabelScheduleSource and were seeing them fired every 2-3 mins. We had the RedisScheduleSource configured as a second source. I identified that the RedisScheduleSource was the cause of the issue due to the use of the scan_iter command and its default page size of 10 (search for default count on that page), therefore requiring 60k requests to Redis. Our monitoring confirms that assessment, this is one day of usage.

We ended up just removing the RedisScheduleSource and the issue went away, we had misconfigured it and thought it was necessary for the decorator crons to work. Now I understand it is for dynamic scheduling, which we're not using yet.

Proposed solution

Add configurability for the number of keys in each scan page. This should allow fewer network hops and therefore much better performance for high numbers of small keys. I'm happy to put up a PR for this if you're accepting contributions!

Initial testing

Here's the results from a quick test of 0 keys versus 500k keys with 10 character values on a Mac Mini M4 Pro with Redis running in docker. Results would of course be more extreme over an external network but it's already a very substantial difference.

Existing logic

nick@Mac-mini taskiq-redis % poetry run python scheduler.py
Time taken: 0.01 seconds
nick@Mac-mini taskiq-redis % poetry run python insert_keys.py 
nick@Mac-mini taskiq-redis % poetry run python scheduler.py  
Time taken: 13.65 seconds

Passing `count=1000` to `scan_iter`

nick@Mac-mini taskiq-redis % poetry run python scheduler.py
Time taken: 0.01 seconds
nick@Mac-mini taskiq-redis % poetry run python insert_keys.py
nick@Mac-mini taskiq-redis % poetry run python scheduler.py
Time taken: 0.23 seconds

Scripts used for reference

# scheduler.py
import asyncio
from taskiq import ScheduledTask
from taskiq_redis import (
    RedisClusterScheduleSource,
    RedisScheduleSource,
    RedisSentinelScheduleSource,
)
import time


redis_url = "redis://localhost:6379/0"


async def main():
    start_time = time.perf_counter()
    source = RedisScheduleSource(redis_url)
    schedule = ScheduledTask(
        task_name="test_task",
        labels={},
        args=[],
        kwargs={},
        cron="* * * * *",
    )
    await source.add_schedule(schedule)
    await source.get_schedules()
    await source.shutdown()
    end_time = time.perf_counter()
    print(f"Time taken: {end_time - start_time:.2f} seconds")


if __name__ == "__main__":
    asyncio.run(main())

# insert_keys.py
import redis
import random
import string
from typing import Final

TOTAL_KEYS: Final[int] = 500_000
BATCH_SIZE: Final[int] = 1000  # Command flush interval


def generate_random_value(length: int = 10) -> str:
    return "".join(random.choices(string.ascii_letters + string.digits, k=length))


def main() -> None:
    r: redis.Redis = redis.Redis(host="localhost", port=6379, db=0)

    # Use a pipeline to batch commands for better performance.
    pipe = r.pipeline()

    for i in range(1, TOTAL_KEYS + 1):
        key: str = f"key{i}"
        value: str = generate_random_value()
        pipe.set(key, value)

        # Execute the batch when reaching the batch_size.
        if i % BATCH_SIZE == 0:
            pipe.execute()

    # Execute any remaining commands.
    pipe.execute()


if __name__ == "__main__":
    main()

The text was updated successfully, but these errors were encountered:

s3rius · 2025-03-31T00:37:40Z

First of all, thanks for your kind words!

Yes, it looks like the current Redis source might be quite inefficient for this use case. I have a few ideas on how to improve it.

One option is to create a custom schedule source using postgres or any other relational database. This would allow you to efficiently query all rows for all pending tasks.

Regarding Redis, I think we can implement the following logic in an upcoming release:

All schedules will be stored as arrays.

There will be a dedicated array for cron jobs like schedule:cron_jobs and a separate array for each minute to handle timed tasks. Each timed task array will have a key formatted like schedule:time_2025-03-31T02:27:00, allowing tasks for the current minute to be queried in O(1) complexity. Scheduled jobs for a given minute will be added to the corresponding array and removed once completed.

I don't really know yet how to handle tasks scheduled for the past, but I'll figure it out. Since it's a critical issue, I'll try implementing it this week.

Thanks again for your feedback!

Also, if you like taskiq, you can help this project by sponsoring it.

nickderobertis · 2025-03-31T15:14:06Z

Thanks for your quick response and prioritizing this!

Regarding Redis, I think we can implement the following logic in an upcoming release:

All schedules will be stored as arrays.

This makes a lot of sense to me and is definitely better performance than my suggestion. My only flag here is that this will be a breaking change--if someone upgrades then the new code won't be able to read existing stored schedules.

Not a problem for us since we just eliminated our usage of it for now, but just want to make sure that's on your radar (it probably already was).

Also, if you like taskiq, you can help this project by sponsoring it.

Sure, happy to contribute financially rather than with code 😁 Hello Patient is now a sponsor through the petsinc github org.

s3rius · 2025-03-31T15:22:20Z

I have an idea of how we can make a graceful upgrade for everybody. I'll try to pull it out in upcoming PR.

s3rius · 2025-04-01T21:11:16Z

Please check out the https://github.com/taskiq-python/taskiq-redis/releases/tag/1.0.4 . I created a ListScheduleSource. Currently it doesn't support clustered or sentinel modes for redis. But we can add it later.

Most importantly it has a function to migrate from one scheduler to another (more details in updated README). Since this function appeared to be quite generic we might include it in the ScheduleSource interface.

s3rius · 2025-04-01T21:27:44Z

Also, currently I'm gathering list of companies that use taskiq. I would really appreciate it if you could share a few words about your experience with TaskIQ and grant me permission to include the "Hello Patient" in our list of our users.

Additionally, if you have any logos or a preferred way of showcasing your company on our website, please let me know — I’d be happy to represent you the way you want.

nickderobertis · 2025-04-03T13:10:00Z

Wow thank you for the quick fix, really great idea to introduce a second class and offer a migration function. We can probably deprecate the old class as well? I'm not sure if there is any advantage to using it.

Yes, I'd be happy to have Hello Patient listed as a TaskIQ user. Let me follow up with our designer to get the best logo file for you and see if she has opinions about how we should be represented.

As for our experience, I had previously used Celery to manage distributed task queues in Python but we are building a 100% async backend for Hello Patient and Celery does not have good support for async. TaskIQ was seamless to integrate, and is even easier to use than Celery. It has better typing support and a simpler API in addition to being fully async for both dispatch and execution. It's been performing very well in production handling tens of thousands of tasks a day with no issues.

s3rius · 2025-04-03T13:23:40Z

We can probably deprecate the old class as well.

Yes, true. I will do it in upcoming release. Seems like new source works much better than previous. I have another company that has integrated new source and reported no errors so far.

nickderobertis · 2025-04-03T17:55:33Z

Great, glad to hear it! I got some feedback from our designer, I'll send it over to the email on your profile and also connect you to her in case you need to follow up on anything.

s3rius mentioned this issue Apr 1, 2025

Added new list-based schedule source. #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting schedules is very slow on Redis instances with many keys #78

Getting schedules is very slow on Redis instances with many keys #78

nickderobertis commented Mar 30, 2025

s3rius commented Mar 31, 2025

Uh oh!

nickderobertis commented Mar 31, 2025

Uh oh!

s3rius commented Mar 31, 2025

Uh oh!

s3rius commented Apr 1, 2025 •

edited

Loading

Uh oh!

s3rius commented Apr 1, 2025 •

edited

Loading

Uh oh!

nickderobertis commented Apr 3, 2025

Uh oh!

s3rius commented Apr 3, 2025 •

edited

Loading

Uh oh!

nickderobertis commented Apr 3, 2025

Uh oh!

Getting schedules is very slow on Redis instances with many keys #78

Getting schedules is very slow on Redis instances with many keys #78

Comments

nickderobertis commented Mar 30, 2025

Overview

Context

Proposed solution

Initial testing

Existing logic

Passing count=1000 to scan_iter

Scripts used for reference

s3rius commented Mar 31, 2025

Uh oh!

nickderobertis commented Mar 31, 2025

Uh oh!

s3rius commented Mar 31, 2025

Uh oh!

s3rius commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s3rius commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickderobertis commented Apr 3, 2025

Uh oh!

s3rius commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickderobertis commented Apr 3, 2025

Uh oh!

Passing `count=1000` to `scan_iter`

s3rius commented Apr 1, 2025 •

edited

Loading

s3rius commented Apr 1, 2025 •

edited

Loading

s3rius commented Apr 3, 2025 •

edited

Loading