Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

A deadlock is encountered when invoking the 'subscribe' function. #2883

Open
suxb201 opened this issue Aug 9, 2023 · 5 comments
Open

A deadlock is encountered when invoking the 'subscribe' function. #2883

suxb201 opened this issue Aug 9, 2023 · 5 comments

Comments

@suxb201
Copy link
Contributor

suxb201 commented Aug 9, 2023

Version:
redis-py 4.5.5 used in a multi-threads program

Platform:
Python 3.9.7 on Linux 5.10 aarch64

Description:
At times, invoking the 'subscribe' function may result in a hang, and based on the stack analysis, it appears that a deadlock might be occurring.

...
Thread 0x0000ffffaca0f150 (most recent call first):
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 1497 in release # try lock self._lock again
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1426 in reset
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1418 in __del__
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 949 in __init__
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 1492 in make_connection
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 1452 in get_connection # with self._lock
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1469 in execute_command
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1634 in subscribe
...

Client is created with:

host="localhost",
port=self._port,
password="*****",
single_connection_client=True,  # might be helpful 
socket_timeout=30,
socket_connect_timeout=10,
ssl=self._is_tls,
ssl_ca_certs=f"xxxxxx/tls/ca.crt"

My program:

...
pubsub = c.pubsub()
pubsub.subscribe(channel) # hang
...
@jaroslawporada
Copy link

@andymccurdy
I believe this commit caused it: 32a5840

I have same deadlock problem with redis as celery broker

', '  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 2114, in execute
    conn = self.connection_pool.get_connection("MULTI", self.shard_hint)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1456, in get_connection
    connection = self.make_connection()
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1496, in make_connection
    return self.connection_class(**self.connection_kwargs)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 960, in __init__
    super().__init__(**kwargs)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 647, in __init__
    self.retry = Retry(NoBackoff(), 1)
', '  File "/usr/local/lib/python3.11/site-packages/redis/backoff.py", line 42, in __init__
    super().__init__(0)
', '  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 1418, in __del__
    self.reset()
', '  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 1426, in reset
    self.connection_pool.release(self.connection)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1501, in release
    with self._lock:

Can we bring back RLock?

@jaroslawporada
Copy link

I think that during getting new connection, garbage collector delete PubSub causing this situation.

@lan17
Copy link

lan17 commented Jan 11, 2025

Running into same issue when using Redis as backend for celery.

@jaroslawporada
Copy link

def patch_redis_connectionpool_lock():
    from hashlib import sha256
    from inspect import getsource
    import threading

    from redis.connection import ConnectionPool

    OLD_METHOD_SHA = (  # pylint: disable=invalid-name
        b"some hash"
    )

    def reset(self):
        self._lock = threading.RLock()  # pylint: disable=protected-access
        self._created_connections = 0  # pylint: disable=protected-access
        self._available_connections = []  # pylint: disable=protected-access
        self._in_use_connections = set()  # pylint: disable=protected-access

        # this must be the last operation in this method. while reset() is
        # called when holding _fork_lock, other threads in this process
        # can call _checkpid() which compares self.pid and os.getpid() without
        # holding any lock (for performance reasons). keeping this assignment
        # as the last operation ensures that those other threads will also
        # notice a pid difference and block waiting for the first thread to
        # release _fork_lock. when each of these threads eventually acquire
        # _fork_lock, they will notice that another thread already called
        # reset() and they will immediately release _fork_lock and continue on.
        self.pid = os.getpid()

    assert OLD_METHOD_SHA == sha256(getsource(ConnectionPool.reset).encode()).digest()
    ConnectionPool.reset = reset

For now I patched this method to previous working implementation till someone merge my MR.

@lan17
Copy link

lan17 commented Jan 11, 2025

BTW, I still ran into deadlocking issues even with RLock, so I am not sure if its the culprit.

My setup is using Redis as Celery backend and having lots of coroutines awaiting the AsyncResult.ready and AsyncResult.get.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants