Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Avoid deadlock in SyncedCollection #530

Merged
merged 1 commit into from
Mar 14, 2021
Merged

Conversation

joaander
Copy link
Member

Description

Add locks needed to avoid deadlocks in SyncedCollection

Motivation and Context

This MWE deadlocks because SycedCollections._locks[lock_id] gets overwritten while it is being used:

import json
import tempfile
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from signac.synced_collections.backends.collection_json import JSONAttrDict
def update_status(data):
    s, filename = data
    d = JSONAttrDict(filename=filename)
    d.update(dict(a=1, b=s-1))
    return False
if __name__ == "__main__":
    N = 1
    n_tasks = 20
    with tempfile.TemporaryDirectory() as d:
        for s in range(N):
            with open(f'{d}/{s}.json', 'w') as f:
                json.dump(dict(a=s, b=0), f)
        tasks = []
        for s in range(N):
            for t in range(n_tasks):
                tasks.append((s, f'{d}/{s}.json'))
        with ThreadPoolExecutor() as e:
            e.map(update_status, tasks)
        print(JSONAttrDict._locks)

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

@joaander joaander requested review from a team as code owners March 12, 2021 19:32
@joaander joaander requested review from bdice and cbkerr and removed request for a team March 12, 2021 19:32
@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #530 (21f11de) into master (786d75f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #530   +/-   ##
=======================================
  Coverage   78.16%   78.17%           
=======================================
  Files          65       65           
  Lines        7036     7039    +3     
  Branches     1310     1311    +1     
=======================================
+ Hits         5500     5503    +3     
  Misses       1231     1231           
  Partials      305      305           
Impacted Files Coverage Δ
...synced_collections/data_types/synced_collection.py 92.14% <100.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 786d75f...21f11de. Read the comment docs.

@bdice bdice requested a review from vyasr March 12, 2021 19:34
Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks again for your help, @joaander. If you can think of a test that would fail (but not deadlock), that would be a good addition. I don't know how I would construct such a case unless there is a way to force a timeout.

@joaander
Copy link
Member Author

Looks good to me. Thanks again for your help, @joaander. If you can think of a test that would fail (but not deadlock), that would be a good addition. I don't know how I would construct such a case unless there is a way to force a timeout.

I tried to get failing behavior in my first attempt on the way to this MWE but wan't successful. I had tried to trigger conflicting writes to the file.

If inspection of private variables is allowed in the test, a test that would fail before this change would construct two JSONAttrDicts with the same file and assert that the identify of the lock JSONAttrDict._locks[filename] stays the same.

Copy link
Member

@cbkerr cbkerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With help from @bdice, I was able to reproduce the problem and fix even though I didn't experience a deadlock by printing threading.get_ident() and "done" inside the update_status function.

@vyasr vyasr merged commit 5709232 into master Mar 14, 2021
@vyasr vyasr deleted the fix-syncedcollection-deadlock branch March 14, 2021 15:22
@vyasr
Copy link
Contributor

vyasr commented Mar 14, 2021

In the interest of getting things merged I'll look into constructing a test later.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Incoherent job statepoint access in subprocess
5 participants