Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Overall revamp of connection pool, retries and timeouts #3008

Open
achimnol opened this issue Oct 16, 2023 · 6 comments
Open

Overall revamp of connection pool, retries and timeouts #3008

achimnol opened this issue Oct 16, 2023 · 6 comments

Comments

@achimnol
Copy link

achimnol commented Oct 16, 2023

I'm experiencing many issues (already reported here by other people) related with the (blocking) connection pool in the asyncio version.

I had to rollback and pin my dependency version to 4.5.5 currently. (lablup/backend.ai#1620)

Since there are already many reports, here I would instead suggest some high-level design suggestions:

@achimnol achimnol changed the title Overall revamp of connection pool and timeouts Overall revamp of connection pool, retries and timeouts Oct 16, 2023
@achimnol
Copy link
Author

I'm not sure how to fix these problems without breaking the existing users. It seems that many of the problems are intertwined and having a better design and abstraction of connection and connection pools would be the way to go.

I understand that the list may look intimidating for the maintainers—I'm not urging to fix them ASAP, but hoping that this list would be a guide to embrace the issues in a more holistic view.

@martinitus
Copy link

martinitus commented Dec 20, 2023

For a new user it is really quite hard to understand whats going on with the connection ConnectionPool, Redis, and Connection objects. - E.g. I am trying to figure out if the retry functionality works with Azure AD authentication in the case that my password (token) expires while I am holding a connection from the pool.

Another couple of suggestions.

  • use with and async with to make sure, connections that are taken from the pool will be returned.
    E.g.
async with pool.take() as redis:
   await redis.set("foo","bar")
  • directly hand the user of the pool a working client (Redis) object
  • discard and recreate connections instead of trying to fix (reconnect them). Follow RAII (resource acquisition is initialization).
    • A connection will be connected when created. and only then added to the pool
    • a failure indicates the connection is broken. it will be cleaned up and removed from the pool
    • the pool will (already) automatically create new ones if required

Just some thoughts...

@achimnol
Copy link
Author

achimnol commented Apr 9, 2024

Looking forward to what happens in #3200! 👀

@SorianoMarmol
Copy link

For a new user it is really quite hard to understand whats going on with the connection ConnectionPool, Redis, and Connection objects. - E.g. I am trying to figure out if the retry functionality works with Azure AD authentication in the case that my password (token) expires while I am holding a connection from the pool.

Do you have any clues or guidance on this? I am analyzing the same authentication, though in my case it's not just Redis, but also Celery, django-redis, python redis lock...

@mzealey
Copy link

mzealey commented Nov 12, 2024

With sentinel it becomes a whole other question also, which I cannot seem to find covered in the docs or examples and even reading the source is a bit confusing. Fundamentally what I want is a reliable object (or two if doing readonly) that I can say 'give me an active connection', but getting to that state is pretty hard...

Currently I create a sentinel, which in itself is a bit horrific:

      sentinel = aioredis.sentinel.Sentinel(  # type: ignore
          settings.redis_sentinel_hosts,
          **_standard_connection_kwargs(settings, **kwargs),
          sentinel_kwargs={
              **_standard_connection_kwargs(settings, **kwargs),
              "password": settings.redis_sentinel_password or settings.redis_password,
          },
      )

I can then use slave_for / master_for but these only seem to be single connections rather than returning a pool that I can use elsewhere. In the test suite it has only one test for SentinelConnectionPool which seems to just test that it tries a couple of hosts and then blows up. So at present I just use master_for / slave_for, but from what I can see from tcpdump on a live server this creates a new TCP conn each time rather than polling it (whereas in this case the sentinel connection is persistent).

@mattjudge
Copy link

Is it possible to get an update on where / if the above issues sit on the roadmap? Is e.g. #3200 still intended to be developed?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants