Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

investigate flaky test-util-callbackify #25068

Closed
Trott opened this issue Dec 15, 2018 · 8 comments
Closed

investigate flaky test-util-callbackify #25068

Trott opened this issue Dec 15, 2018 · 8 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@Trott
Copy link
Member

Trott commented Dec 15, 2018

Like a number of flaky tests right now, this one looks like it is caused by #25007.

https://ci.nodejs.org/job/node-test-commit-aix/19751/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-1

10:18:13 not ok 1990 parallel/test-util-callbackify
10:18:13   ---
10:18:13   duration_ms: 3.731
10:18:13   severity: fail
10:18:13   exitcode: 1
10:18:13   stack: |-
10:18:13     assert.js:86
10:18:13       throw new AssertionError(obj);
10:18:13       ^
10:18:13     
10:18:13     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
10:18:13     
10:18:13     null !== 1
10:18:13     
10:18:13         at common.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-util-callbackify.js:205:14)
10:18:13         at /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common/index.js:335:15
10:18:13         at ChildProcess.exithandler (child_process.js:301:5)
10:18:13         at ChildProcess.emit (events.js:189:13)
10:18:13         at maybeClose (internal/child_process.js:978:16)
10:18:13         at Process.ChildProcess._handle.onexit (internal/child_process.js:265:5)
10:18:13   ...
@Trott Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Dec 15, 2018
@Trott
Copy link
Member Author

Trott commented Dec 30, 2018

https://ci.nodejs.org/job/node-test-commit-aix/20112/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-2

00:54:27 not ok 2000 parallel/test-util-callbackify
00:54:27   ---
00:54:27   duration_ms: 7.620
00:54:27   severity: fail
00:54:27   exitcode: 1
00:54:27   stack: |-
00:54:27     assert.js:86
00:54:27       throw new AssertionError(obj);
00:54:27       ^
00:54:27     
00:54:27     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
00:54:27     
00:54:27     null !== 1
00:54:27     
00:54:27         at common.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-util-callbackify.js:205:14)
00:54:27         at /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common/index.js:377:15
00:54:27         at ChildProcess.exithandler (child_process.js:304:5)
00:54:27         at ChildProcess.emit (events.js:188:13)
00:54:27         at maybeClose (internal/child_process.js:978:16)
00:54:27         at Process.ChildProcess._handle.onexit (internal/child_process.js:265:5)
00:54:27   ...

@Trott
Copy link
Member Author

Trott commented Dec 30, 2018

Probably caused by #25007.

Trott added a commit to Trott/io.js that referenced this issue Dec 30, 2018
Trott added a commit to Trott/io.js that referenced this issue Jan 2, 2019
Refs: nodejs#25068

PR-URL: nodejs#25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Jan 5, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@gireeshpunathil
Copy link
Member

unable to reproduce in a 1000 runs. Looks like this has vanished?

@Trott
Copy link
Member Author

Trott commented Jan 9, 2019

unable to reproduce in a 1000 runs. Looks like this has vanished?

@gireeshpunathil Here it is failing on FreeBSD in CI yesterday:

https://ci.nodejs.org/job/node-test-commit-freebsd/nodes=freebsd11-x64/23246/console

test-digitalocean-freebsd11-x64-2

22:55:45 not ok 1996 parallel/test-util-callbackify
22:55:45   ---
22:55:45   duration_ms: 0.357
22:55:45   severity: fail
22:55:45   exitcode: 1
22:55:45   stack: |-
22:55:45     assert.js:86
22:55:45       throw new AssertionError(obj);
22:55:45       ^
22:55:45     
22:55:45     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
22:55:45     
22:55:45     null !== 1
22:55:45     
22:55:45         at common.mustCall (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-util-callbackify.js:205:14)
22:55:45         at /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/common/index.js:376:15
22:55:45         at ChildProcess.exithandler (child_process.js:302:5)
22:55:45         at ChildProcess.emit (events.js:188:13)
22:55:45         at maybeClose (internal/child_process.js:978:16)
22:55:45         at Process.ChildProcess._handle.onexit (internal/child_process.js:265:5)
22:55:45   ...

To reproduce it, you might have more luck with something like tools/test.py -j 16 --repeat 100 test/parallel/test-util-callbackify.js. (But you also might not...)

@gireeshpunathil
Copy link
Member

thanks @Trott for the info; I will try my luck on freebsd today.

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Jan 10, 2019

able to reproduce in __freebsd__

(gdb) thr 1
[Switching to thread 1 (process 101816)]#0  0x000000080323d277 in pthread_mutex_destroy ()
   from /lib/libthr.so.3
(gdb) where
#0  0x000000080323d277 in pthread_mutex_destroy () from /lib/libthr.so.3
#1  0x000000080323c7e6 in pthread_mutex_lock () from /lib/libthr.so.3
#2  0x000000000145b86d in v8::platform::tracing::TracingController::GetCategoryGroupEnabledInternal ()
#3  0x0000000000e60316 in v8::internal::CancelableLambdaTask<v8::internal::ArrayBufferCollector::FreeAllocations()::$_0>::RunInternal ()
#4  0x0000000000980fb6 in node::(anonymous namespace)::PlatformWorkerThread ()
#5  0x0000000803235c06 in pthread_create () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
(gdb) thr 2
[Switching to thread 2 (process 100871)]
(gdb) where
#0  0x00000000008fdeb5 in std::__1::__tree< ...> ::destroy ()
#1  0x00000000008fdebd in std::__1::__tree< ...> ::destroy ()
#2  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#3  0x00000000008fdebd in std::__1::__tree< ...> ::destroy ()
#4  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#5  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#6  0x00000000008fdec9 in std::__1::__tree< ...> ::destroy ()
#8  0x00000000008fc5f3 in node::native_module::NativeModuleLoader::~NativeModuleLoader ()
#9  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#10 0x000000080351f001 in exit () from /lib/libc.so.7
#11 0x000000000091e5a7 in node::FatalException ()
#12 0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#13 0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#14 0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#15 0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#16 0x0000000000e41b38 in v8::internal::Execution::Call ()
#17 0x0000000000abea61 in v8::Function::Call ()
#18 0x00000000008f9e3d in node::LoadEnvironment ()
#19 0x00000000008feb13 in node::Start ()
#20 0x00000000008fdd8a in node::Start ()
#21 0x00000000008fbfdb in node::Start ()
#22 0x00000000008ae095 in _start ()
#23 0x00000008022f0000 in ?? ()
#24 0x0000000000000000 in ?? ()

this is essentially the same as #24921 , tracked through #25007 with 2 visible differences:

  • in thread 2 is in its exit route due to a fatalException in the JS land - which is expected as the child ('test/fixtures/uncaught-exceptions/callbackify1.js') is actually throwing an exception that leads to its termination.
  • while thread 2 is thus cleaning up things in one side, thread 1 is progressing with tracing. In Investigate flaky sequential/test-child-process-execsync on AIX #24921 the TracingController object itself was stae due to the Agent destruction, here it went ahead one or two steps but succumbed to crash when it attempted to access a mutex that was already destroyed (by thread 2)

I will produce and analyze a couple of more core files to see if there are other pattern(s).

@gireeshpunathil
Copy link
Member

got a bunch of those. So while the victim thread shows always the same stack at tracing, the offender thread (exiting thread) are seen at these different stages, with __cxa_finalize as a common vortex:

pattern 2:

#0  0x0000000000974ba4 in node::options_parser::OptionsParser<node::PerProcessOptions>::~OptionsParser ()
#1  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#2  0x000000080351f001 in exit () from /lib/libc.so.7
#3  0x000000000091e5a7 in node::FatalException ()
#4  0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#5  0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#6  0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#7  0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#8  0x0000000000e41b38 in v8::internal::Execution::Call ()
#9  0x0000000000abea61 in v8::Function::Call ()
#10 0x00000000008f9e3d in node::LoadEnvironment ()
#11 0x00000000008feb13 in node::Start ()
#12 0x00000000008fdd8a in node::Start ()
#13 0x00000000008fbfdb in node::Start ()
#14 0x00000000008ae095 in _start ()
#15 0x00000008022f0000 in ?? ()
#16 0x0000000000000000 in ?? ()

pattern 3:

#0  0x0000000803517544 in free () from /lib/libc.so.7
#1  0x0000000000974cf2 in node::options_parser::OptionsParser<node::PerProcessOptions>::~OptionsParser ()
#2  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#3  0x000000080351f001 in exit () from /lib/libc.so.7
#4  0x000000000091e5a7 in node::FatalException ()
#5  0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#6  0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#7  0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#8  0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#9  0x0000000000e41b38 in v8::internal::Execution::Call ()
#10 0x0000000000abea61 in v8::Function::Call ()
#11 0x00000000008f9e3d in node::LoadEnvironment ()
#12 0x00000000008feb13 in node::Start ()
#13 0x00000000008fdd8a in node::Start ()
#14 0x00000000008fbfdb in node::Start ()
#15 0x00000000008ae095 in _start ()
#16 0x00000008022f0000 in ?? ()
#17 0x0000000000000000 in ?? ()

pattern 4:

#0  0x000000080350c197 in _pthread_mutex_init_calloc_cb () from /lib/libc.so.7
#1  0x00000008034f1cd7 in _malloc_thread_cleanup () from /lib/libc.so.7
#2  0x0000000803517224 in realloc () from /lib/libc.so.7
#3  0x00000008035175b1 in free () from /lib/libc.so.7
#4  0x0000000000974cd1 in node::options_parser::OptionsParser<node::PerProcessOptions>::~OptionsParser ()
#5  0x0000000803591911 in __cxa_finalize () from /lib/libc.so.7
#6  0x000000080351f001 in exit () from /lib/libc.so.7
#7  0x000000000091e5a7 in node::FatalException ()
#8  0x0000000000f886d2 in v8::internal::MessageHandler::ReportMessageNoExceptions ()
#9  0x0000000000f88578 in v8::internal::MessageHandler::ReportMessage ()
#10 0x0000000000f57d1f in v8::internal::Isolate::ReportPendingMessagesImpl ()
#11 0x0000000000e41e70 in v8::internal::(anonymous namespace)::Invoke ()
#12 0x0000000000e41b38 in v8::internal::Execution::Call ()
#13 0x0000000000abea61 in v8::Function::Call ()
#14 0x00000000008f9e3d in node::LoadEnvironment ()
#15 0x00000000008feb13 in node::Start ()
#16 0x00000000008fdd8a in node::Start ()
#17 0x00000000008fbfdb in node::Start ()
#18 0x00000000008ae095 in _start ()
#19 0x00000008022f0000 in ?? ()
#20 0x0000000000000000 in ?? ()

Just like #24921 , this should be fixed by #25061 .

refack pushed a commit to refack/node that referenced this issue Jan 14, 2019
Refs: nodejs#25068

PR-URL: nodejs#25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this issue Jan 16, 2019
Refs: nodejs#25068

PR-URL: nodejs#25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@gireeshpunathil
Copy link
Member

the test is active for the last one week with no failures, so as expected #25061 has fixed the underlying issue. Closing.

BethGriggs pushed a commit that referenced this issue Apr 28, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
BethGriggs pushed a commit that referenced this issue May 10, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue May 16, 2019
Refs: #25068

PR-URL: #25284
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

No branches or pull requests

2 participants