Detect Rayon deadlocks #55

michaelsproul · 2023-11-26T23:59:10Z

Rayon has this footgun for lock usage where it can deadlock if both of the following conditions are true:

Lock L is held while calling into Rayon (e.g. calling rayon::join while holding a mutex).
Lock L is accessed from within Rayon tasks (e.g. calling mutex.lock() within par_iter).

Deadlocks can occur in multiple ways: if the first thread that is holding the lock calls into Rayon and steals some of the jobs of type (2) which require the lock, then those jobs will not complete because of the circular dependency (they are waiting for the thread to release the lock, but the thread is waiting for them to complete because it is waiting for the Rayon call to return). There's a similar variant if the thread holding the lock is itself a Rayon thread, in which case the deadlock occurs because the thread tries to re-lock the lock it already holds.

How could lockbud help? It would be prone to false positives, but it could detect any calls to Rayon functions that occur while holding a Mutex/RwLock. I think this is similar to detecting double-locking.

Unfortunately it seems that a deadlock-avoiding mutex for Rayon is not likely to be implemented any time soon. The naive approach of yielding when locking fails does not work because there is no "task" that can be yielded to which will unblock execution – we need the current task to finish to return control to the caller (see rayon-rs/rayon#592 (comment)). Unlike in async code, there is no way to interrupt a Rayon task while it is running, because it is just an ordinary function. This suggests it also wouldn't be feasible to port tokio::sync::Mutex which uses wakers and other futures-specific features to work around this problem in the async context (see https://github.com/tokio-rs/tokio/blob/2400769b54fd31186b07bfbec277e846e8d64c16/tokio/src/sync/batch_semaphore.rs#L2-L17).

The text was updated successfully, but these errors were encountered:

michaelsproul mentioned this issue Nov 27, 2023

Decide on a policy for Rayon usage sigp/lighthouse#4952

Open

HarukaMa mentioned this issue Feb 7, 2024

[Bug] Deadlock with rayon usage AleoNet/snarkOS#3063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect Rayon deadlocks #55

Detect Rayon deadlocks #55

michaelsproul commented Nov 26, 2023

Detect Rayon deadlocks #55

Detect Rayon deadlocks #55

Comments

michaelsproul commented Nov 26, 2023