Skip to content

Getting schedules is very slow on Redis instances with many keys #78

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
nickderobertis opened this issue Mar 30, 2025 · 8 comments
Open

Comments

@nickderobertis
Copy link

First off, Taskiq is a great library! Thanks so much for bringing async to distributed task queues in Python. We're running this in production and it's doing well generally.

Overview

RedisScheduleSource.get_schedules becomes very slow with a large number of keys (2-3 minutes with real-world usage over the network for a full scan of 600k keys).

Context

We have minutely scheduled tasks (via the LabelScheduleSource and were seeing them fired every 2-3 mins. We had the RedisScheduleSource configured as a second source. I identified that the RedisScheduleSource was the cause of the issue due to the use of the scan_iter command and its default page size of 10 (search for default count on that page), therefore requiring 60k requests to Redis. Our monitoring confirms that assessment, this is one day of usage.

Image

We ended up just removing the RedisScheduleSource and the issue went away, we had misconfigured it and thought it was necessary for the decorator crons to work. Now I understand it is for dynamic scheduling, which we're not using yet.

Proposed solution

Add configurability for the number of keys in each scan page. This should allow fewer network hops and therefore much better performance for high numbers of small keys. I'm happy to put up a PR for this if you're accepting contributions!

Initial testing

Here's the results from a quick test of 0 keys versus 500k keys with 10 character values on a Mac Mini M4 Pro with Redis running in docker. Results would of course be more extreme over an external network but it's already a very substantial difference.

Existing logic

nick@Mac-mini taskiq-redis % poetry run python scheduler.py
Time taken: 0.01 seconds
nick@Mac-mini taskiq-redis % poetry run python insert_keys.py 
nick@Mac-mini taskiq-redis % poetry run python scheduler.py  
Time taken: 13.65 seconds

Passing count=1000 to scan_iter

nick@Mac-mini taskiq-redis % poetry run python scheduler.py
Time taken: 0.01 seconds
nick@Mac-mini taskiq-redis % poetry run python insert_keys.py
nick@Mac-mini taskiq-redis % poetry run python scheduler.py
Time taken: 0.23 seconds

Scripts used for reference

# scheduler.py
import asyncio
from taskiq import ScheduledTask
from taskiq_redis import (
    RedisClusterScheduleSource,
    RedisScheduleSource,
    RedisSentinelScheduleSource,
)
import time


redis_url = "redis://localhost:6379/0"


async def main():
    start_time = time.perf_counter()
    source = RedisScheduleSource(redis_url)
    schedule = ScheduledTask(
        task_name="test_task",
        labels={},
        args=[],
        kwargs={},
        cron="* * * * *",
    )
    await source.add_schedule(schedule)
    await source.get_schedules()
    await source.shutdown()
    end_time = time.perf_counter()
    print(f"Time taken: {end_time - start_time:.2f} seconds")


if __name__ == "__main__":
    asyncio.run(main())
# insert_keys.py
import redis
import random
import string
from typing import Final

TOTAL_KEYS: Final[int] = 500_000
BATCH_SIZE: Final[int] = 1000  # Command flush interval


def generate_random_value(length: int = 10) -> str:
    return "".join(random.choices(string.ascii_letters + string.digits, k=length))


def main() -> None:
    r: redis.Redis = redis.Redis(host="localhost", port=6379, db=0)

    # Use a pipeline to batch commands for better performance.
    pipe = r.pipeline()

    for i in range(1, TOTAL_KEYS + 1):
        key: str = f"key{i}"
        value: str = generate_random_value()
        pipe.set(key, value)

        # Execute the batch when reaching the batch_size.
        if i % BATCH_SIZE == 0:
            pipe.execute()

    # Execute any remaining commands.
    pipe.execute()


if __name__ == "__main__":
    main()
@s3rius
Copy link
Member

s3rius commented Mar 31, 2025

First of all, thanks for your kind words!

Yes, it looks like the current Redis source might be quite inefficient for this use case. I have a few ideas on how to improve it.

One option is to create a custom schedule source using postgres or any other relational database. This would allow you to efficiently query all rows for all pending tasks.

Regarding Redis, I think we can implement the following logic in an upcoming release:

All schedules will be stored as arrays.

There will be a dedicated array for cron jobs like schedule:cron_jobs and a separate array for each minute to handle timed tasks. Each timed task array will have a key formatted like schedule:time_2025-03-31T02:27:00, allowing tasks for the current minute to be queried in O(1) complexity. Scheduled jobs for a given minute will be added to the corresponding array and removed once completed.

I don't really know yet how to handle tasks scheduled for the past, but I'll figure it out. Since it's a critical issue, I'll try implementing it this week.

Thanks again for your feedback!

Also, if you like taskiq, you can help this project by sponsoring it.

@nickderobertis
Copy link
Author

Thanks for your quick response and prioritizing this!

Regarding Redis, I think we can implement the following logic in an upcoming release:

All schedules will be stored as arrays.

This makes a lot of sense to me and is definitely better performance than my suggestion. My only flag here is that this will be a breaking change--if someone upgrades then the new code won't be able to read existing stored schedules.

Not a problem for us since we just eliminated our usage of it for now, but just want to make sure that's on your radar (it probably already was).

Also, if you like taskiq, you can help this project by sponsoring it.

Sure, happy to contribute financially rather than with code 😁 Hello Patient is now a sponsor through the petsinc github org.

@s3rius
Copy link
Member

s3rius commented Mar 31, 2025

I have an idea of how we can make a graceful upgrade for everybody. I'll try to pull it out in upcoming PR.

@s3rius
Copy link
Member

s3rius commented Apr 1, 2025

Please check out the https://github.com/taskiq-python/taskiq-redis/releases/tag/1.0.4 . I created a ListScheduleSource. Currently it doesn't support clustered or sentinel modes for redis. But we can add it later.

Most importantly it has a function to migrate from one scheduler to another (more details in updated README). Since this function appeared to be quite generic we might include it in the ScheduleSource interface.

@s3rius
Copy link
Member

s3rius commented Apr 1, 2025

Also, currently I'm gathering list of companies that use taskiq. I would really appreciate it if you could share a few words about your experience with TaskIQ and grant me permission to include the "Hello Patient" in our list of our users.

Additionally, if you have any logos or a preferred way of showcasing your company on our website, please let me know — I’d be happy to represent you the way you want.

@nickderobertis
Copy link
Author

Wow thank you for the quick fix, really great idea to introduce a second class and offer a migration function. We can probably deprecate the old class as well? I'm not sure if there is any advantage to using it.

Yes, I'd be happy to have Hello Patient listed as a TaskIQ user. Let me follow up with our designer to get the best logo file for you and see if she has opinions about how we should be represented.

As for our experience, I had previously used Celery to manage distributed task queues in Python but we are building a 100% async backend for Hello Patient and Celery does not have good support for async. TaskIQ was seamless to integrate, and is even easier to use than Celery. It has better typing support and a simpler API in addition to being fully async for both dispatch and execution. It's been performing very well in production handling tens of thousands of tasks a day with no issues.

@s3rius
Copy link
Member

s3rius commented Apr 3, 2025

We can probably deprecate the old class as well.

Yes, true. I will do it in upcoming release. Seems like new source works much better than previous. I have another company that has integrated new source and reported no errors so far.

@nickderobertis
Copy link
Author

Great, glad to hear it! I got some feedback from our designer, I'll send it over to the email on your profile and also connect you to her in case you need to follow up on anything.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants