Skip to content

test_concurrent_futures.test_deadlock: test_crash_big_data() hangs randomly on Windows #107219

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
vstinner opened this issue Jul 25, 2023 · 24 comments
Labels
OS-windows tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Jul 25, 2023

GHA Windows x86 job, test_crash_big_data() hangs on ProcessPoolExecutor.shutdown(): https://github.com/python/cpython/actions/runs/5651960914/job/15310873235?pr=107217

  • Main thread: ProcessPoolExecutor.shutdown()
  • Thread 2: Threading.join()
  • Thread 3: queue _feed() => connection send_bytes()
(...)
0:39:52 load avg: 0.06 running: test_concurrent_futures (19 min 2 sec)
0:40:22 load avg: 0.05 running: test_concurrent_futures (19 min 32 sec)
0:40:51 load avg: 0.03 [447/447/2] test_concurrent_futures crashed (Exit code 1)
Timeout (0:20:00)!
Thread 0x000007e8 (most recent call first):
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 282 in _send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 199 in send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\queues.py", line 246 in _feed
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00001738 (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x0000103c (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  File "D:\a\cpython\cpython\Lib\concurrent\futures\process.py", line 836 in shutdown
  File "D:\a\cpython\cpython\Lib\concurrent\futures\_base.py", line 647 in __exit__
  File "D:\a\cpython\cpython\Lib\test\test_concurrent_futures.py", line 1386 in test_crash_big_data
  (...)
  File "D:\a\cpython\cpython\Lib\test\support\__init__.py", line 1241 in run_unittest
  File "D:\a\cpython\cpython\Lib\test\libregrtest\runtest.py", line 294 in _test_module
  (...)

Linked PRs

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Jul 25, 2023
@vstinner
Copy link
Member Author

By the way, test.regrtest doesn't work with test_concurrent_futures: when test_concurrent_futures is re-run in verbose mode, no tests is ran!

0:40:52 Re-running test_concurrent_futures in verbose mode

----------------------------------------------------------------------
Ran 0 tests in 0.001s

NO TESTS RAN

This second bug maybe hides the first bug (test_concurrent_futures hangs sometimes).

@vstinner
Copy link
Member Author

Windows x64 job, also blocked on test_crash_big_data(): https://github.com/python/cpython/actions/runs/5652189009/job/15311447694

Similar threads state.

0:22:04 load avg: 5.60 [447/447/2] test_concurrent_futures crashed (Exit code 1)
Timeout (0:20:00)!
Thread 0x00001be0 (most recent call first):
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 282 in _send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 199 in send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\queues.py", line 246 in _feed
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x000002a0 (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00000c18 (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  File "D:\a\cpython\cpython\Lib\concurrent\futures\process.py", line 836 in shutdown
  File "D:\a\cpython\cpython\Lib\concurrent\futures\_base.py", line 647 in __exit__
  File "D:\a\cpython\cpython\Lib\test\test_concurrent_futures.py", line 1386 in test_crash_big_data
  (...)

@vstinner
Copy link
Member Author

vstinner commented Jul 25, 2023

Azure Pipelines: Windows PR Tests win32 hangs on test_interpreter_shutdown(): https://dev.azure.com/Python/cpython/_build/results?buildId=133201&view=logs&j=d554cd63-f8f4-5b2d-871b-33e4ea76e915&t=5a14d0eb-dbd4-5b80-f5d0-7909f950a1cc

  • Main thread: run_python_until_end() => subprocess.Popen.communicate()
  • Thread 2 (stdout?): subprocess _readerthread()
  • Thread 3 (stderr?): subprocess _readerthread()

In short, the main thread is waiting until the process completes and the process is killed after 20 minutes.

On the same CI, the win64 job ran test_concurrent_futures in 1 min 33 sec.

(...)
test_hang_gh94440 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest.test_hang_gh94440)
shutdown(wait=True) doesn't hang when a future was submitted and ... skipped 'Tested platform does not support the alarm signal'
test_hang_issue12364 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest.test_hang_issue12364) ... ok

Timeout (0:20:00)!
Thread 0x00001184 (most recent call first):
  File "D:\a\1\s\Lib\subprocess.py", line 1597 in _readerthread
  File "D:\a\1\s\Lib\threading.py", line 989 in run
  File "D:\a\1\s\Lib\threading.py", line 1052 in _bootstrap_inner
  File "D:\a\1\s\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00001224 (most recent call first):
  File "D:\a\1\s\Lib\subprocess.py", line 1597 in _readerthread
  File "D:\a\1\s\Lib\threading.py", line 989 in run
  File "D:\a\1\s\Lib\threading.py", line 1052 in _bootstrap_inner
  File "D:\a\1\s\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00001560 (most recent call first):
  File "D:\a\1\s\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\1\s\Lib\threading.py", line 1126 in join
  File "D:\a\1\s\Lib\subprocess.py", line 1626 in _communicate
  File "D:\a\1\s\Lib\subprocess.py", line 1209 in communicate
  File "D:\a\1\s\Lib\test\support\script_helper.py", line 139 in run_python_until_end
  File "D:\a\1\s\Lib\test\support\script_helper.py", line 149 in _assert_python
  File "D:\a\1\s\Lib\test\support\script_helper.py", line 166 in assert_python_ok
  File "D:\a\1\s\Lib\test\test_concurrent_futures.py", line 302 in test_interpreter_shutdown
  (...)
  File "<frozen runpy>", line 198 in _run_module_as_main

@AlexWaygood
Copy link
Member

I reported a similar issue in May 2022, but closed it as it seemed the issue stopped occurring in CI:

@AlexWaygood AlexWaygood added tests Tests in the Lib/test dir OS-windows labels Jul 26, 2023
@Eclips4
Copy link
Member

Eclips4 commented Aug 24, 2023

./python -m test -v test_concurrent_futures -m test_crash_big_data --forever get me this:

many lines..
0:00:05 [  9] test_concurrent_futures
test_crash_big_data (test.test_concurrent_futures.ProcessPoolForkExecutorDeadlockTest.test_crash_big_data) ... skipped 'require un
ix system'
test_crash_big_data (test.test_concurrent_futures.ProcessPoolForkserverExecutorDeadlockTest.test_crash_big_data) ... skipped 'requ
ire unix system'
test_crash_big_data (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data) ... Warning -- Uncaugh
t thread exception: InvalidStateError
Exception in thread Thread-9:
Traceback (most recent call last):
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\process.py", line 344, in run
    self.terminate_broken(cause)
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\process.py", line 492, in terminate_broken
    work_item.future.set_exception(bpe)
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\_base.py", line 559, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x2cf561cf260 state=cancelled>
0.55s Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 11880)>
Warning -- Dangling thread: <Thread(QueueFeederThread, started daemon 4112)>
ok
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 11880)>
Warning -- Dangling thread: <Thread(QueueFeederThread, started daemon 4112)>

Sadly, but it's hard to reproduce.

@vstinner
Copy link
Member Author

Sadly, but it's hard to reproduce.

You can stress the system to make the issue more likely. For example, open a second terminal and run:

python -m test -j2

You can use -j4 or more depending on the number of CPUs and how much you want your machine to be stressed :-)

@Eclips4
Copy link
Member

Eclips4 commented Aug 25, 2023

Sadly, but it's hard to reproduce.

You can stress the system to make the issue more likely. For example, open a second terminal and run:

python -m test -j2

You can use -j4 or more depending on the number of CPUs and how much you want your machine to be stressed :-)

Oh, that's right! With -j8 (in a separate terminal) I can reproduce bug more easily.

@lazka
Copy link
Contributor

lazka commented Aug 25, 2023

(I'm also seeing this hang in the mingw fork after updating from 3.11.4 to 3.11.5)

@vstinner
Copy link
Member Author

Error on Linux, not sure if it's related.

aarch64 Fedora Stable LTO + PGO 3.x buildbot: https://buildbot.python.org/all/#/builders/524/builds/4310

Log (reformatted for readability):

FAIL: test_interpreter_shutdown (test.test_concurrent_futures.test_shutdown.ProcessPoolForkserverProcessPoolShutdownTest.test_interpreter_shutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/test/test_concurrent_futures/test_shutdown.py", line 49, in test_interpreter_shutdown
    self.assertFalse(err)

AssertionError: b'Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/concurrent/futures/process.py", line 339, in run
    self.add_call_item_to_queue()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/concurrent/futures/process.py", line 394, in add_call_item_to_queue
    self.call_queue.put(_CallItem(work_id,
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/queues.py", line 94, in put
    self._start_thread()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/queues.py", line 177, in _start_thread
    self._thread.start()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/threading.py", line 978, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can\'t create new thread at interpreter shutdown
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory
' is not false

@vstinner
Copy link
Member Author

Similar error on another Linux machine.

ARM Raspbian 3.x: https://buildbot.python.org/all/#/builders/424/builds/4736

Logs (reformatted):

FAIL: test_interpreter_shutdown (test.test_concurrent_futures.test_shutdown.ProcessPoolForkserverProcessPoolShutdownTest.test_interpreter_shutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_concurrent_futures/test_shutdown.py", line 49, in test_interpreter_shutdown
    self.assertFalse(err)
AssertionError: b'Exception in thread Thread-1:
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/process.py", line 339, in run
    self.add_call_item_to_queue()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/process.py", line 394, in add_call_item_to_queue
    self.call_queue.put(_CallItem(work_id,
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/queues.py", line 94, in put
    self._start_thread()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/queues.py", line 177, in _start_thread
    self._thread.start()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/threading.py", line 978, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can\'t create new thread at interpreter shutdown
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory
' is not false

lazka added a commit to msys2-contrib/cpython-mingw that referenced this issue Aug 26, 2023
See python#107219
Once that is fixed this commit can be removed.
This is a commit and not just an addition to the skip list, since
we still run the skipped tests in CI and in this case everything would hang.
lazka added a commit to msys2-contrib/cpython-mingw that referenced this issue Aug 27, 2023
See python#107219
Once that is fixed this commit can be removed.
This is a commit and not just an addition to the skip list, since
we still run the skipped tests in CI and in this case everything would hang.
@vstinner
Copy link
Member Author

See also issue #105829.

@cjw296
Copy link
Contributor

cjw296 commented Aug 30, 2023

@vstinner - test_crash_big_data and test_interpreter_shutdown look like they might be separate problems.
Might be worth splitting into separate issues?

Also, #105829 appears to be a different issue entirely when many wakeups being sent result in a deadlock.

@vstinner
Copy link
Member Author

Would you mind to create a separated issue?

@cjw296
Copy link
Contributor

cjw296 commented Aug 30, 2023

@vstinner - you created this issue so probably makes more sense for you to do so?

vstinner added a commit to vstinner/cpython that referenced this issue Sep 6, 2023
Fix a race condition in _ExecutorManagerThread.terminate_broken():
ignore the InvalidStateError on future.set_exception(). It can happen
if the future is cancelled before the caller.

Moreover, test_crash_big_data() now waits explicitly until the
executor completes.
@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

To reproduce the test_crash_big_data() hang, I use this command on Windows:

python -m test test_concurrent_futures.test_deadlock -v -m test.test_concurrent_futures.test_deadlock.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data --timeout=30

I wrote PR #108974 to fix one of the bugs, InvalidStateError in terminate_broken().

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

To reproduce the test_crash_big_data() hang, I use this command on Windows:
python -m test test_concurrent_futures.test_deadlock -v -m test.test_concurrent_futures.test_deadlock.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data --timeout=30

By the way, if I interrupt this command with CTRL+C, sometimes... it hangs as well!

0:04:06 [381] test_concurrent_futures.test_deadlock
test_crash_big_data (test.test_concurrent_futures.test_deadlock.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data) ... 

^C

Traceback (most recent call last):
  File "<string>", line 1, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\__init__.py", line 16, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\__init__.py", line 16, in <module>
    from . import context
    from . import context
  File "C:\victor\python\main\Lib\multiprocessing\context.py", line 6, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\context.py", line 6, in <module>
    from . import reduction
    from . import reduction
  File "C:\victor\python\main\Lib\multiprocessing\reduction.py", line 16, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\reduction.py", line 15, in <module>
    import pickle
    import socket
  File "C:\victor\python\main\Lib\pickle.py", line 34, in <module>
  File "C:\victor\python\main\Lib\socket.py", line 52, in <module>
    import re
    import _socket
KeyboardInterrupt
  File "C:\victor\python\main\Lib\re\__init__.py", line 125, in <module>
    from . import _compiler, _parser
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1000, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1133, in get_code

Timeout (0:00:30)!
Thread 0x000005bc (most recent call first):
  File "C:\victor\python\main\Lib\multiprocessing\connection.py", line 282 in _send_bytes
  File "C:\victor\python\main\Lib\multiprocessing\connection.py", line 199 in send_bytes
  File "C:\victor\python\main\Lib\multiprocessing\queues.py", line 246 in _feed
  File "C:\victor\python\main\Lib\threading.py", line 996 in run
  File "C:\victor\python\main\Lib\threading.py", line 1059 in _bootstrap_inner
  File "C:\victor\python\main\Lib\threading.py", line 1016 in _bootstrap

Thread 0x000012a4 (most recent call first):
  File "C:\victor\python\main\Lib\threading.py", line 1153 in _wait_for_tstate_lock
  File "C:\victor\python\main\Lib\threading.py", line 1133 in join
  File "C:\victor\python\main\Lib\multiprocessing\queues.py", line 199 in _finalize_join
  File "C:\victor\python\main\Lib\multiprocessing\util.py", line 224 in __call__
  File "C:\victor\python\main\Lib\multiprocessing\queues.py", line 151 in join_thread
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 560 in join_executor_internals
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 514 in terminate_broken
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 344 in run
  File "C:\victor\python\main\Lib\threading.py", line 1059 in _bootstrap_inner
  File "C:\victor\python\main\Lib\threading.py", line 1016 in _bootstrap

Thread 0x00001368 (most recent call first):
  File "C:\victor\python\main\Lib\threading.py", line 1153 in _wait_for_tstate_lock
  File "C:\victor\python\main\Lib\threading.py", line 1133 in join
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 843 in shutdown
  File "C:\victor\python\main\Lib\concurrent\futures\_base.py", line 647 in __exit__
  File "C:\victor\python\main\Lib\test\test_concurrent_futures\test_deadlock.py", line 236 in test_crash_big_data
  (...)

vstinner added a commit that referenced this issue Sep 6, 2023
Fix a race condition in _ExecutorManagerThread.terminate_broken():
ignore the InvalidStateError on future.set_exception(). It can happen
if the future is cancelled before the caller.

Moreover, test_crash_big_data() now waits explicitly until the
executor completes.
@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2023

I analyzed the test_interpreter_shutdown() bug and I created issue #109047 which my findings. Please continue the discussion on test_interpreter_shutdown() in issue #109047.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Multiprocessing issues Sep 13, 2023
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 23, 2023
…utures

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
serhiy-storchaka added a commit that referenced this issue Sep 26, 2023
…GH-109780)

Follow-up of gh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2023
…utures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Sep 26, 2023
…futures (GH-109780) (GH-109882)

Follow-up of gh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
…utures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
Yhg1s pushed a commit that referenced this issue Oct 2, 2023
… (#109254)

gh-107219: Fix concurrent.futures terminate_broken() (GH-109244)

Fix a race condition in concurrent.futures. When a process in the
process pool was terminated abruptly (while the future was running or
pending), close the connection write end. If the call queue is
blocked on sending bytes to a worker process, closing the connection
write end interrupts the send, so the queue can be closed.

Changes:

* _ExecutorManagerThread.terminate_broken() now closes
  call_queue._writer.
* multiprocessing PipeConnection.close() now interrupts
  WaitForMultipleObjects() in _send_bytes() by cancelling the
  overlapped operation.
(cherry picked from commit a9b1f84)

Co-authored-by: Victor Stinner <vstinner@python.org>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 10, 2023
…rrent_futures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Nov 10, 2023
…futures (GH-109780) (GH-111934)

Follow-up of gh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)
encukou added a commit to encukou/cpython that referenced this issue Jan 23, 2024
…ot concurrent.futures

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by:	Serhiy Storchaka <storchaka@gmail.com>
encukou added a commit that referenced this issue Jan 24, 2024
…current.futures (GH-114489)

This was left out of the 3.12 backport for three related issues:
- gh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- gh-109370 (which changes this to be only called on Windows)
- gh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 21, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Jul 11, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Jul 11, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Jul 11, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to msys2-contrib/cpython-mingw that referenced this issue Aug 5, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…utures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Sep 4, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
OS-windows tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

5 participants