-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499
Comments
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by using the same checks everywhere. Fixes: nodejs#43499
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by using the same checks everywhere. Fixes: nodejs#43499
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: nodejs#43499
Seems #43533 has not fixed this flaky test, they are still failing: |
There seems to be something wrong with the stack trace in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15541/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination: 01:38:28 not ok 775 parallel/test-worker-fshandles-open-close-on-termination
01:38:28 ---
01:38:28 duration_ms: 3.528
01:38:28 severity: fail
01:38:28 exitcode: 134
01:38:28 stack: |-
01:38:28 C:\Windows\SYSTEM32\cmd.exe[8708]: C:\workspace\node-compile-windows\node\src\node_file-inl.h:162: Assertion `finished_' failed.
01:38:28 1: 00007FF65494964F node_api_throw_syntax_error+176223
01:38:28 2: 00007FF6548D74E6 SSL_get_quiet_shutdown+67398
01:38:28 3: 00007FF6548D78B2 SSL_get_quiet_shutdown+68370
01:38:28 4: 00007FF6548C03ED v8::base::CPU::has_fpu+38813
01:38:28 5: 00007FF6548C19A9 v8::base::CPU::has_fpu+44377
01:38:28 6: 00007FF6549AB4A7 uv_timer_stop+1207
01:38:28 7: 00007FF6549A7A4B uv_async_send+331
01:38:28 8: 00007FF6549A71DC uv_loop_init+1292
01:38:28 9: 00007FF6549A737A uv_run+202
01:38:28 10: 00007FF654976225 node::SpinEventLoop+309
01:38:28 11: 00007FF65480CCC0 v8::internal::interpreter::BytecodeLabel::bind+35904
01:38:28 12: 00007FF6548083E8 v8::internal::interpreter::BytecodeLabel::bind+17256
01:38:28 13: 00007FF654997A4D uv_poll_stop+557
01:38:28 14: 00007FF655951D70 v8::internal::compiler::ToString+145936
01:38:28 15: 00007FFB74404ED0 BaseThreadInitThunk+16
01:38:28 16: 00007FFB75FCE39B RtlUserThreadStart+43
01:38:28 ... Not sure how |
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs#43499 Refs: nodejs#43084
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs#43499 Refs: nodejs#43084 PR-URL: nodejs#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: nodejs/node#43499 PR-URL: nodejs/node#43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@aduh95 Realized that you reopen this issue, did you find any recent failure? :) |
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs/node#43499 Refs: nodejs/node#43084 PR-URL: nodejs/node#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs/node#43499 Refs: nodejs/node#43084 PR-URL: nodejs/node#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Ping @aduh95 |
Test
test-worker-fshandles-open-close-on-termination
test-worker-fshandles-error-on-termination
Platform
test-worker-fshandles-open-close-on-termination on MacOS
test-worker-fshandles-error-on-termination on Windows
Console output
test-worker-fshandles-open-close-on-termination:
test-worker-fshandles-error-on-termination:
Build links
test-worker-fshandles-open-close-on-termination:
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx11-x64/45661/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination/
test-worker-fshandles-error-on-termination:
https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15222/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_error_on_termination/
Additional information
Related pr: #42910
This pr landed on 2022-07-18, modify
src/node_file-inl.h
file and addtest-worker-fshandles-open-close-on-termination
andtest-worker-fshandles-error-on-termination
.cc @santigimeno @aduh95 @RaisinTen
The text was updated successfully, but these errors were encountered: