Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499

F3n67u · 2022-06-20T02:59:27Z

Test

test-worker-fshandles-open-close-on-termination
test-worker-fshandles-error-on-termination

Platform

test-worker-fshandles-open-close-on-termination on MacOS
test-worker-fshandles-error-on-termination on Windows

Console output

test-worker-fshandles-open-close-on-termination:

out/Release/node[53255]: ../src/node_file-inl.h:162:virtual node::fs::FSReqPromise<node::AliasedBufferBase<double, v8::Float64Array, void>>::~FSReqPromise() [AliasedBufferT = node::AliasedBufferBase<double, v8::Float64Array, void>]: Assertion `finished_' failed.
 1: 0x10767eeb5 node::Abort() [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 2: 0x10767ecd1 node::Assert(node::AssertionInfo const&) [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 3: 0x10767a806 node::fs::FSReqPromise<node::AliasedBufferBase<double, v8::Float64Array, void> >::~FSReqPromise() [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 4: 0x107683cc9 node::fs::FSReqAfterScope::~FSReqAfterScope() [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 5: 0x1076841fd node::fs::AfterOpenFileHandle(uv_fs_s*) [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 6: 0x1081162f8 uv__work_done [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 7: 0x10811a6eb uv__async_io [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 8: 0x10812dd6e uv__io_poll [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
 9: 0x10811abe1 uv_run [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
10: 0x1075aee23 node::SpinEventLoop(node::Environment*) [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
11: 0x1077268bf node::worker::Worker::Run() [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
12: 0x10772a302 node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) [/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node]
13: 0x7fff203e58fc _pthread_start [/usr/lib/system/libsystem_pthread.dylib]
14: 0x7fff203e1443 thread_start [/usr/lib/system/libsystem_pthread.dylib]

test-worker-fshandles-error-on-termination:

Administrator: npm install[8184]: C:\workspace\node-compile-windows\node\src\node_file-inl.h:162: Assertion `finished_' failed.
 1: 00007FF610F69CCF node_api_throw_syntax_error+176223
 2: 00007FF610EF7B66 v8::internal::wasm::WasmCode::safepoint_table_offset+67398
 3: 00007FF610EF7F32 v8::internal::wasm::WasmCode::safepoint_table_offset+68370
 4: 00007FF610EE0A6D v8::base::CPU::has_fpu+39565
 5: 00007FF610EE2029 v8::base::CPU::has_fpu+45129
 6: 00007FF610FCBB27 uv_timer_stop+1207
 7: 00007FF610FC80CB uv_async_send+331
 8: 00007FF610FC785C uv_loop_init+1292
 9: 00007FF610FC79FA uv_run+202
10: 00007FF610F968A5 node::SpinEventLoop+309
11: 00007FF610E2CE40 v8::internal::wasm::SignatureMap::Freeze+35904
12: 00007FF610E28568 v8::internal::wasm::SignatureMap::Freeze+17256
13: 00007FF610FB80CD uv_poll_stop+557
14: 00007FF611F71600 v8::internal::compiler::ToString+145936
15: 00007FFB74404ED0 BaseThreadInitThunk+16
16: 00007FFB75FCE39B RtlUserThreadStart+43

Build links

test-worker-fshandles-open-close-on-termination:

https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx11-x64/45661/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination/

test-worker-fshandles-error-on-termination:

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15222/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_error_on_termination/

Additional information

Related pr: #42910

This pr landed on 2022-07-18, modify src/node_file-inl.h file and add test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination.

cc @santigimeno @aduh95 @RaisinTen

The text was updated successfully, but these errors were encountered:

We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by using the same checks everywhere. Fixes: nodejs#43499

We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: nodejs#43499

F3n67u · 2022-07-10T13:54:39Z

Seems #43533 has not fixed this flaky test, they are still failing:

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15569/RUN_SUBSET=0,nodes=win2012r2-COMPILED_BY-vs2019-x86/testReport/junit/(root)/test/parallel_test_worker_fshandles_error_on_termination/

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15541/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination/

RaisinTen · 2022-07-10T14:19:36Z

There seems to be something wrong with the stack trace in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15541/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination:

01:38:28 not ok 775 parallel/test-worker-fshandles-open-close-on-termination
01:38:28   ---
01:38:28   duration_ms: 3.528
01:38:28   severity: fail
01:38:28   exitcode: 134
01:38:28   stack: |-
01:38:28     C:\Windows\SYSTEM32\cmd.exe[8708]: C:\workspace\node-compile-windows\node\src\node_file-inl.h:162: Assertion `finished_' failed.
01:38:28      1: 00007FF65494964F node_api_throw_syntax_error+176223
01:38:28      2: 00007FF6548D74E6 SSL_get_quiet_shutdown+67398
01:38:28      3: 00007FF6548D78B2 SSL_get_quiet_shutdown+68370
01:38:28      4: 00007FF6548C03ED v8::base::CPU::has_fpu+38813
01:38:28      5: 00007FF6548C19A9 v8::base::CPU::has_fpu+44377
01:38:28      6: 00007FF6549AB4A7 uv_timer_stop+1207
01:38:28      7: 00007FF6549A7A4B uv_async_send+331
01:38:28      8: 00007FF6549A71DC uv_loop_init+1292
01:38:28      9: 00007FF6549A737A uv_run+202
01:38:28     10: 00007FF654976225 node::SpinEventLoop+309
01:38:28     11: 00007FF65480CCC0 v8::internal::interpreter::BytecodeLabel::bind+35904
01:38:28     12: 00007FF6548083E8 v8::internal::interpreter::BytecodeLabel::bind+17256
01:38:28     13: 00007FF654997A4D uv_poll_stop+557
01:38:28     14: 00007FF655951D70 v8::internal::compiler::ToString+145936
01:38:28     15: 00007FFB74404ED0 BaseThreadInitThunk+16
01:38:28     16: 00007FFB75FCE39B RtlUserThreadStart+43
01:38:28   ...

Not sure how node_api_throw_syntax_error got in. The same thing happened with the stack trace in the Windows crash reported in #43084 which is about a very similar test (crash during a Worker termination).

We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>

These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs#43499 Refs: nodejs#43084

These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>

These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs#43499 Refs: nodejs#43084 PR-URL: nodejs#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>

We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: nodejs/node#43499 PR-URL: nodejs/node#43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>

These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>

ywave620 · 2022-10-14T12:42:08Z

@aduh95 Realized that you reopen this issue, did you find any recent failure? :)

These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs/node#43499 Refs: nodejs/node#43084 PR-URL: nodejs/node#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>

lpinca · 2023-06-29T18:08:36Z

Ping @aduh95

F3n67u added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Jun 20, 2022

F3n67u changed the title ~~Investigate test-worker-fshandles-open-close-on-termination crash~~ Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash Jun 20, 2022

joyeecheung mentioned this issue Jun 22, 2022

fs: don't end fs promises on Isolate termination #42910

Merged

santigimeno mentioned this issue Jun 22, 2022

src: fix crash on FSReqPromise destructor #43533

Merged

nodejs-github-bot closed this as completed in #43533 Jul 9, 2022

F3n67u reopened this Jul 10, 2022

kvakil mentioned this issue Aug 2, 2022

test: tune down parallelism for some flaky tests #44090

Merged

bnoordhuis closed this as completed Sep 11, 2022

aduh95 reopened this Sep 11, 2022

lpinca closed this as completed Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499

Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499

F3n67u commented Jun 20, 2022 •

edited

Loading

F3n67u commented Jul 10, 2022

RaisinTen commented Jul 10, 2022

ywave620 commented Oct 14, 2022 •

edited

Loading

lpinca commented Jun 29, 2023

Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499

Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499

Comments

F3n67u commented Jun 20, 2022 • edited Loading

Test

Platform

Console output

Build links

Additional information

F3n67u commented Jul 10, 2022

RaisinTen commented Jul 10, 2022

ywave620 commented Oct 14, 2022 • edited Loading

lpinca commented Jun 29, 2023

F3n67u commented Jun 20, 2022 •

edited

Loading

ywave620 commented Oct 14, 2022 •

edited

Loading