Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

4.2.1 gives 'Illegal instruction' - SIGILL in node on some CPU types #23938

Open
f-roscher opened this issue Dec 13, 2021 · 14 comments
Open

4.2.1 gives 'Illegal instruction' - SIGILL in node on some CPU types #23938

f-roscher opened this issue Dec 13, 2021 · 14 comments

Comments

@f-roscher
Copy link

f-roscher commented Dec 13, 2021

Dear all,

after the upgrade to 4.2.1 my Rocket.chat instance did not start again.

TO debug this I did some parallel installation with versions 4.2.0 and 4.2.1 with (cd /tmp/bundle/programs/server; npm i).
4.2.0 does start fine, 4.2.1stops after some seconds with 'Illegal instruction':

# MONGO_URL="mongodb://127.0.0.1:27017/rocketchat?replicaSet=001-rs" MONGO_OPLOG_URL="mongodb://127.0.0.1:27017/local?replicaSet=001-rs" ROOT_URL=https://HOSTNAME_REPLACED_HERE  DEPLOY_PLATFORM=ansible PORT=3000  /usr/local/n/versions/node/12.22.1/bin/node main.js
Illegal instruction

System is up-to-date Debian 10. Deployed with Ansible using https://github.com/RocketChat/Rocket.Chat.Ansible with some variables added - this worked fine for the past months. I set these variables inside Ansible:

vars:
      rocket_chat_include_nginx: false
      rocket_chat_pgp_command: gpg
      rocket_chat_tarball_gpg_key: 9DA31620334BD75D9DCB49F368818C72E52529D4
      rocket_chat_tarball_gpg_keyserver: hkp://keyserver.ubuntu.com:80
      #rocket_chat_tarball_gpg_keyserver: hkp://p80.pool.sks-keyservers.net:80
      # hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4

      rocket_chat_service_host: chat01-back.int.somesite.de
      rocket_chat_automatic_upgrades: true
      rocket_chat_node_version:  12.22.1
      rocket_chat_npm_version: 6.14.1

and

---
rocket_chat_mongodb_org_version: 4.2
rocket_chat_mongodb_gpg_key: E162F504A20CDF15827F718D4B7C549A058F8B6B
rocket_chat_mongodb_service_name: mongod
rocket_chat_mongodb_org_pkgs: true
rocket_chat_mongodb_packages:
  - mongodb-org
  - mongodb-org-server

rocket_chat_dist_specific_packages:
  - g++

Started with gdb:

# MONGO_URL="mongodb://127.0.0.1:27017/rocketchat?replicaSet=001-rs" MONGO_OPLOG_URL="mongodb://127.0.0.1:27017/local?replicaSet=001-rs" ROOT_URL=https://HOSTNAME_REPLACED_HERE  DEPLOY_PLATFORM=ansible PORT=3000  gdb /usr/local/n/versions/node/12.22.1/bin/node 
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/n/versions/node/12.22.1/bin/node...done.
(gdb) run main.js
Starting program: /usr/local/n/versions/node/12.22.1/bin/node main.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ab7700 (LWP 10497)]
[New Thread 0x7ffff72b6700 (LWP 10498)]
[New Thread 0x7ffff6ab5700 (LWP 10499)]
[New Thread 0x7ffff62b4700 (LWP 10500)]
[New Thread 0x7ffff5ab3700 (LWP 10501)]
[New Thread 0x7ffff52b2700 (LWP 10502)]
[Detaching after fork from child process 10503]
[New Thread 0x7ffff509e700 (LWP 10504)]
[Thread 0x7ffff509e700 (LWP 10504) exited]
[New Thread 0x7ffff509e700 (LWP 10505)]
[New Thread 0x7fffdeefe700 (LWP 10506)]
[New Thread 0x7fffde6fd700 (LWP 10507)]
[New Thread 0x7fffddefc700 (LWP 10508)]
[Detaching after fork from child process 10511]
[Detaching after fork from child process 10515]

Thread 1 "node" received signal SIGILL, Illegal instruction.
0x00007fffd6038dfb in ?? () from /srv/4.2.1/bundle/programs/server/npm/node_modules/sharp/build/Release/../.././vendor/8.11.3/linux-x64/lib/libvips-cpp.so.42
(gdb) 

This is it for the moment. Please ask about anything else which might help tracking this down. Or about what I might try myself here.

Best regards
Florian

@EnableServices
Copy link

We have the same issue - about 10 seconds after start up, Rocket Chat exits with 'Illegal instruction'

Were you able to fix?

  • Jon

@f-roscher
Copy link
Author

@EnableServices , I have not fixed it yet.
I tried some younger versions, same bug with those versions up to now.

If someone is willing to dive into this I would love and be able to support in debugging and provide additional data to reproduce this bug.

@f-roscher
Copy link
Author

To get some version dependencies out of the way, exclude them as possible solutions, I tried the Docker images:
The same behaviour: I do get an error with Docker tags 4.6.0 and latest. I do have no error with 4.2.0.
Error message from docker-compose logs: bash: line 1: 196 Illegal instruction (core dumped) node main.js

The number before 'Illegal instruction' varies.

I would be eager to help debugging - is anyone in with me? Now I can reproduce it with the official docker images? ;)

@f-roscher
Copy link
Author

@EnableServices Would you please check wether your host on which you do get 'illegal instruction' has the avx flag on the CPUs set? The output of cat /proc/cpuinfo |grep avx will reveal the AVX capability if it is there.

@f-roscher
Copy link
Author

Current state of things is: I have the whole installations sitting inside a LXC OS container - both Rocket.Chat installed as systemd service and a Docker daemon with Rocket.Chat running from registry.rocket.chat/rocketchat/rocket.chat:4.6.0 .
I can start this LXC container on two different hardware nodes with slightly different CPUs.

Both installations of Rocket.Chat behave consistently:
When running on node 1 they crash with 'Illegal instruction', on node 2 it runs fine.

Node 2 does have avx feature on its CPU, node 1 does not.

Next step: Try to prove this difference is the reason really.

@Pinaute
Copy link

Pinaute commented Apr 4, 2022

Hello,
@f-roscher
I have the same problem as you when I tried to update to the version 4.2.1.
I check and I don't have the avx flag.

@Pinaute
Copy link

Pinaute commented Apr 4, 2022

AVX will be required for MongoDB 5.0
But, I don't see why we have this message because we are not changing the database version

MongoDB 5.0 requires use of the AVX instruction set, available on select Intel and AMD processors.

https://www.mongodb.com/docs/manual/administration/production-notes/#footnote-microarch-intel

@f-roscher
Copy link
Author

f-roscher commented Apr 4, 2022

Warning messages from MongoDB log entries and relating Google hits brought me onto this track.
But the SIGKILL and message 'Illegal instruction' reported here are not related to MongoDB versions. It is the node process running Rocket.Chat that crashes and it happens independently from MongoDB version, I tried with 4.2 as systemd unit on the installation without Docker and with mongod Docker containers 4.0 and 4.2 with Docker installation.

I assume some part of Rocket.Chat or any of its (node) dependencies have the same dependency on AVX like MongoDB5 has. I cannot prove that yet. But I can show Rocket.Chat runs fine on AMD Opteron(TM) Processor 6238 and fails on AMD Opteron(tm) Processor 6172.
One of the differences is the AVX feature.

EDIT, P:S. Right now I am trying to build a Docker image of Rocket.Chat 4.6.0 that runs fine on a CPU without AVX feature or get a full gdb backtrace pinpointing the exact location of that illegal instruction. Any help with npm install, variations etc. welcome.

@f-roscher f-roscher changed the title 4.2.1 gives 'Illegal instruction' - SIGILL in node 4.2.1 gives 'Illegal instruction' - SIGILL in node on some CPU types Apr 5, 2022
@GermanAizek
Copy link

@f-roscher,
There is an option to take my patch to automate the build without Sandy Bridge CPU's optimizations.
https://github.com/GermanAizek/mongodb-without-avx

@klaucode
Copy link

I have the same issue, when I'm installing directly to ubuntu.
Illegal instruction (core dumped)

@GermanAizek
Copy link

@klaucode have you tried my patch?

@klaucode
Copy link

@klaucode have you tried my patch?

before, I had a problems with mongo, but now, I'm successfully running MongoDB 4.2.24 on the same server without any problems and I'm able to connect to mongo. Should I still try your patch?

@klaucode
Copy link

@GermanAizek thanks a lot for your comments.

I solved the problem. Illegal instruction means that nodejs wants to use some instruction, which is not available on the CPU. In case of mongo, there was problem with missing "avx" and in case of Rocket.Chat, there was missing also some another CPU instruction (I was running on older server with AMD CPU). After migration to another server everything works.

Probably it is important to put somewhere into documentation this info, because I spend a looot of time with investigation.

@KiritoStudio
Copy link

I have the same problem. My vm is created on Proxmox VE and I use the docker deployment. After changing CPU type to host, the issue is resolved.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants