Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Investigate issue 1502: Node tests systematically fail in ubuntu-24.04-arm (Node 22, gcc, g++, -Db_sanitize=address, Debug) #1503

Closed
wants to merge 19 commits into from

Conversation

ibc
Copy link
Member

@ibc ibc commented Mar 7, 2025

Investigating issue #1502

  • First step: Remove all CI files but the failing one and only run test-Worker.ts, commits 92d0f7d and b70d81b.

@nazar-pc
Copy link
Collaborator

nazar-pc commented Mar 7, 2025

This can be tested without creating a PR with temporary commits

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

I've found a probably related issue in gcc:

numpy/numpy#25556

It clearly happens when using gcc with ASAN (AKA -Db_sanitize=address) in Ubuntu 22.04-arm and 24.04-arm. Note that in mediasoup-node.yaml I did not add ubuntu-22.04-arm with ASAN, that's why only ubuntu-24.04-arm was failing. I'm adding it now to confirm that it also fails.

@nazar-pc
Copy link
Collaborator

nazar-pc commented Mar 7, 2025

Interesting, maybe we can run it at least without ASAN or try to pick a newer version of GCC in case it is fixed there (if available)

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

Interesting, maybe we can run it at least without ASAN or try to pick a newer version of GCC in case it is fixed there (if available)

Honestly I think we should use default gcc included in Ubuntu versions, otherwise we are testing something that people is not using. Also note that the bug (if it's this bug) was fixed in 2024-03-27, and Ubuntu 22.04 was released in April 2024-04-25 so most probably it doesn't come with a gcc version that includes the fix.

You mean that we should have an additional entry in mediasoup-node.yaml in which we install a newer version of gcc?

@nazar-pc
Copy link
Collaborator

nazar-pc commented Mar 7, 2025

I meant just for ASAN if it causes issues. I agree we should use default version otherwise.

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

BTW @nazar-pc, maybe you know. How can we skip ubuntu-24.04-arm only in Debug mode? Note that only Debug mode fails. We have this (simplified):

strategy:
      matrix:
        build:
          - os: ubuntu-24.04-arm
            node: 22
            cc: gcc
            cxx: g++
            meson_args: '-Db_sanitize=address'
        build-type:
          - Release
          - Debug

    runs-on: ${{ matrix.build.os }}

    env:
      CC: ${{ matrix.build.cc }}
      CXX: ${{ matrix.build.cxx }}
      MEDIASOUP_SKIP_WORKER_PREBUILT_DOWNLOAD: 'true'
      MEDIASOUP_LOCAL_DEV: 'true'
      MEDIASOUP_BUILDTYPE: ${{ matrix.build-type }}
      MESON_ARGS: ${{ matrix.build.meson_args }}

How can we tell it to NOT run npm run test:node only for ubuntu-24.04-arm when in Debug build-type?

@nazar-pc
Copy link
Collaborator

nazar-pc commented Mar 7, 2025

You can exclude certain configs if that is what you meant: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/running-variations-of-jobs-in-a-workflow#excluding-matrix-configurations

You can also do if: matrix.build.os != "ubuntu-24.04-arm" and things like that on specific steps.

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

This can be tested without creating a PR with temporary commits

I mean, I need to only test test-mediasoup.ts and test-Worker.ts, I need to push commits XD

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

Wow, according to this table, Ubuntu 24.04 uses GCC 14.2.0: https://distrowatch.com/table.php?distribution=ubuntu

And according to GCC releases page, latest version is... 14.2.0: https://gcc.gnu.org/releases.html

So... ¯\_(ツ)_/¯

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

It clearly happens when using gcc with ASAN (AKA -Db_sanitize=address) in Ubuntu 22.04-arm and 24.04-arm. Note that in mediasoup-node.yaml I did not add ubuntu-22.04-arm with ASAN, that's why only ubuntu-24.04-arm was failing. I'm adding it now to confirm that it also fails.

Cool, ubuntu-22.04-arm with ASAN in Debug mode does not fail :)

https://github.com/versatica/mediasoup/actions/runs/13727895621/job/38398441037?pr=1503

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

I'm totally lost.

I'm running mediasoup in Docker locally using rollinroy/ubuntu-24.04-hpc-arm:latest image (I hope that's the one or similar to what ubuntu-24.04-arm host uses in CI). It says that gcc version is...

gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

which is THE SAME used in ubuntu-24.04-arm in CI as shown here:

image

However... tests pass in my Docker.

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

Oh no... those envs in our Dockerfile...

# Make CC and CXX point to clang/clang++ installed above.
ENV LANG="C.UTF-8"
ENV CC="clang"
ENV CXX="clang++"

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

Here we are! It also fails if I use default gcc 13.3.0 in rollinroy/ubuntu-24.04-hpc-arm:latest image in Docker locally.

MEDIASOUP_BUILDTYPE=Debug MESON_ARGS="-Db_sanitize=address" ASAN_OPTIONS="detect_leaks=0" npm ci --foreground-scripts

MEDIASOUP_BUILDTYPE=Debug MESON_ARGS="-Db_sanitize=address" ASAN_OPTIONS="detect_leaks=0" npm run test:node

Result:

 FAIL  node/src/test/test-Worker.ts (20.426 s)
  ✓ Worker.workerBin matches mediasoup-worker absolute path (2 ms)
  ✓ createWorker() FOO succeeds (851 ms)
  ✕ createWorker() succeeds (2006 ms)
  ✓ createWorker() with wrong settings rejects with TypeError (951 ms)
  ✓ worker.updateSettings() succeeds (1021 ms)
  ✓ worker.updateSettings() with wrong settings rejects with TypeError (1043 ms)
  ✓ worker.updateSettings() rejects with InvalidStateError if closed (1030 ms)
  ✓ worker.dump() succeeds (795 ms)
  ✓ worker.dump() rejects with InvalidStateError if closed (1016 ms)
  ✓ worker.getResourceUsage() succeeds (1008 ms)
  ✓ worker.close() succeeds (1015 ms)
  ✓ Worker emits "died" if worker process died unexpectedly (2831 ms)
  ✕ worker process ignores PIPE, HUP, ALRM, USR1 and USR2 signals (3004 ms)

  ● createWorker() succeeds

    thrown: "Exceeded timeout of 2000 ms for a test.
    Add a timeout value to this test to increase the timeout, if this is a long-running test. See https://jestjs.io/docs/api#testname-fn-timeout."

      41 | }, 20000);
      42 |
    > 43 | test('createWorker() succeeds', async () => {
         | ^
      44 | 	const onObserverNewWorker = jest.fn();
      45 |
      46 | 	mediasoup.observer.once('newworker', onObserverNewWorker);

      at Object.<anonymous> (node/src/test/test-Worker.ts:43:1)

  ● worker process ignores PIPE, HUP, ALRM, USR1 and USR2 signals

    thrown: "Exceeded timeout of 3000 ms for a test.
    Add a timeout value to this test to increase the timeout, if this is a long-running test. See https://jestjs.io/docs/api#testname-fn-timeout."

      314 | // so we just skip this test in Windows.
      315 | if (os.platform() !== 'win32') {
    > 316 | 	test('worker process ignores PIPE, HUP, ALRM, USR1 and USR2 signals', async () => {
          | 	^
      317 | 		const worker = await mediasoup.createWorker({ logLevel: 'warn' });
      318 |
      319 | 		await new Promise<void>((resolve, reject) => {

      at Object.<anonymous> (node/src/test/test-Worker.ts:316:2)

 PASS  node/src/test/test-mediasoup.ts
  ✓ mediasoup.version matches version field in package.json (1 ms)
  ✓ setLoggerEventListeners() works (1042 ms)
  ✓ mediasoup.getSupportedRtpCapabilities() returns the mediasoup RTP capabilities (1 ms)
  ✓ parseScalabilityMode() works (1 ms)

Test Suites: 1 failed, 1 passed, 2 total
Tests:       2 failed, 15 passed, 17 total

This is the modified worker/Dockerfile:

FROM rollinroy/ubuntu-24.04-hpc-arm:latest

# Install dependencies.
RUN set -x \
	&& apt-get update \
	&& apt-get install --yes \
		clang pkg-config bash-completion wget curl \
		screen python3-pip python3-yaml pkg-config zlib1g-dev \
		libgss-dev libssl-dev libxml2-dev gdb

# Install node 20.
RUN set -x \
	&& apt-get update \
	&& apt-get install --yes ca-certificates curl gnupg \
	&& mkdir -p /etc/apt/keyrings \
	&& curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
	| gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
	&& NODE_MAJOR=20 \
	&& echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" \
	> /etc/apt/sources.list.d/nodesource.list \
	&& apt-get update \
	&& apt-get install nodejs --yes

# Enable core dumps.
RUN set -x \
	&& echo "mkdir -p /tmp/cores && chmod 777 /tmp/cores && echo \"/tmp/cores/core.%e.sig%s.%p\" > /proc/sys/kernel/core_pattern && ulimit -c unlimited" >> ~/.bashrc

# Make CC and CXX point to clang/clang++ installed above.
ENV LANG="C.UTF-8"
ENV CC="gcc"
ENV CXX="g++"

ENV MEDIASOUP_LOCAL_DEV="true"
ENV KEEP_BUILD_ARTIFACTS="1"

WORKDIR /mediasoup

CMD ["bash"]

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

I'm testing with gcc 14 and... it also fails with it.

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

Interesting results:

  ✓ Worker.workerBin matches mediasoup-worker absolute path (2 ms)
  ✓ createWorker() FOO succeeds (854 ms)
  ✕ createWorker() succeeds (2006 ms)
  ✓ createWorker() with wrong settings rejects with TypeError (967 ms)
  ✓ worker.updateSettings() succeeds (990 ms)
  ✓ worker.updateSettings() with wrong settings rejects with TypeError (1024 ms)
  ✓ worker.updateSettings() rejects with InvalidStateError if closed (1010 ms)
  ✓ worker.dump() succeeds (817 ms)
  ✓ worker.dump() rejects with InvalidStateError if closed (999 ms)
  ✓ worker.getResourceUsage() succeeds (975 ms)
  ✓ worker.close() succeeds (1013 ms)
  ✓ Worker emits "died" if worker process died unexpectedly (2808 ms)
  ✕ worker process ignores PIPE, HUP, ALRM, USR1 and USR2 signals (3002 ms)

  ✓ mediasoup.version matches version field in package.json (1 ms)
  ✓ setLoggerEventListeners() works (1021 ms)
  ✓ mediasoup.getSupportedRtpCapabilities() returns the mediasoup RTP capabilities (2 ms)
  ✓ parseScalabilityMode() works (1 ms)

Note ✕ createWorker() succeeds (2006 ms) which passes if I increment the timeout from 2000 to something higher.

So only this test fails no matter which timeout I set:

test('worker process ignores PIPE, HUP, ALRM, USR1 and USR2 signals', async () => {
		const worker = await mediasoup.createWorker({ logLevel: 'warn' });

		await new Promise<void>((resolve, reject) => {
			worker.on('died', reject);

			process.kill(worker.pid, 'SIGPIPE');
			process.kill(worker.pid, 'SIGHUP');
			process.kill(worker.pid, 'SIGALRM');
			process.kill(worker.pid, 'SIGUSR1');
			process.kill(worker.pid, 'SIGUSR2');

			setTimeout(() => {
				expect(worker.closed).toBe(false);

				worker.close();
				worker.on('subprocessclose', resolve);
			}, 2000);
		});

@ibc
Copy link
Member Author

ibc commented Mar 7, 2025

Ok, I think it's just a problem with createWorker() that, for whatever reason, when using ubuntu-24.04-arm with GCC 13 or 14 in Debug mode, it takes much longer than usual. All Node tests run createWorker() in the beforeEach() hook, that's why those tests don't fail. However test-Worker.ts and test-mediasoup.ts run createWorker() within test() blocks, and those are the timeouts that fire. Increasing them seem to mitigate the issue.

@nazar-pc
Copy link
Collaborator

nazar-pc commented Mar 7, 2025

There must be a bug somewhere (not necessarily our code, though it is possible). It should not take multiple seconds, there is just no sane reason for that to be necessary.

@ibc
Copy link
Member Author

ibc commented Mar 8, 2025

Given that it only takes so long in Linux GCC ARM in Debug mode... who knows. I don have energy to investigate that honestly. It could be whatever in GCC for ARM.

@ibc
Copy link
Member Author

ibc commented Mar 8, 2025

This PR did its job. Now fixing #1502 in PR #1504.

@ibc ibc closed this Mar 8, 2025
@ibc ibc deleted the investigate-issue-1502 branch March 8, 2025 09:52
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants