Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug] Alpine-based images quit with fatal error on aarch64 #23306

Closed
2 of 3 tasks
ozangunalp opened this issue Sep 13, 2024 · 6 comments
Closed
2 of 3 tasks

[Bug] Alpine-based images quit with fatal error on aarch64 #23306

ozangunalp opened this issue Sep 13, 2024 · 6 comments
Labels
triage/lhotari/important lhotari's triaging label for important issues or PRs type/bug The PR fixed a bug or issue reported a bug
Milestone

Comments

@ozangunalp
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Official Pulsar images with 3.3.0 and 3.3.1

Minimal reproduce step

Running alpine-based container images on aarch64 machine.
We could reproduce it on RHEL 8 and raspberrypi but not not M1.

What did you expect to see?

Pulsar server continue to run

What did you see instead?

Here is the complete log of the container:

Here is the log: pulsar.txt

Last lines of log before fatal error :

2024-09-10T14:02:58,193+0000 [pulsar-io-18-4] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xa1215d54, L:/127.0.0.1:6650 - R:/127.0.0.1:34536] [SR:127.0.0.1, state:Connected]] Subscribing on topic persistent://public/default/__change_events / reader-936c229a0f. consumerId: 0
2024-09-10T14:02:58,269+0000 [pulsar-io-18-4] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/__change_events
2024-09-10T14:02:58,271+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - Creating '/managed-ledgers/public/default/persistent/__change_events'
2024-09-10T14:02:58,340+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [192.168.144.2:46605] for ledger: 1
2024-09-10T14:02:58,344+0000 [BookKeeperClientWorker-OrderedExecutor-18-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/__change_events] Created ledger 1 after closed null
2024-09-10T14:02:58,352+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/__change_events] Successfully initialize managed ledger
2024-09-10T14:02:58,394+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/__change_events] Disabled replicated subscriptions controller
2024-09-10T14:02:58,428+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/__change_events] Cursor __compaction recovered to position 1:-1
2024-09-10T14:02:58,444+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/__change_events] Opened new cursor: ManagedCursorImpl{ledger=public/default/persistent/__change_events, name=__compaction, ackPos=1:-1, readPos=1:0}
2024-09-10T14:02:58,455+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic persistent://public/default/__change_events - dedup is disabled
2024-09-10T14:02:58,501+0000 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [192.168.144.2:46605] for ledger: 2
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffffa0b43e78, pid=10, tid=280
#
# JRE version: OpenJDK Runtime Environment Corretto-21.0.3.9.1 (21.0.3+9) (build 21.0.3+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-21.0.3.9.1 (21.0.3+9-LTS, mixed mode, tiered, compressed class ptrs, z gc, linux-aarch64)
# Problematic frame:
# 2024-09-10T14:03:28,153+0000 [pulsar-io-18-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/192.168.144.1:44996] Closing consumer: consumerId=0
2024-09-10T14:03:28,154+0000 [pulsar-io-18-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/192.168.144.1:44996] Closed consumer before its creation was completed. consumerId=0
2024-09-10T14:03:28,174+0000 [pulsar-io-18-5] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /192.168.144.1:44996
2024-09-10T14:03:28,174+0000 [pulsar-io-18-1] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /192.168.144.1:44986

Anything else?

Originally posed on quarkusio/quarkus#43187

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@ozangunalp ozangunalp added the type/bug The PR fixed a bug or issue reported a bug label Sep 13, 2024
@lhotari
Copy link
Member

lhotari commented Sep 16, 2024

Thanks for reporting this issue @ozangunalp.
Most of the Pulsar developers use Macs with Apple Silicon so I guess that's why we haven't caught this issue earlier.

Running alpine-based container images on aarch64 machine.
We could reproduce it on RHEL 8 and raspberrypi but not not M1.

Any hints for what would be a practical to reproduce this? Using a cloud VM on aarch64? Any recommendations?

@lhotari lhotari added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Sep 16, 2024
@lhotari lhotari added this to the 4.0.0 milestone Sep 16, 2024
@ozangunalp
Copy link
Author

Most of the Pulsar developers use Macs with Apple Silicon so I guess that's why we haven't caught this issue earlier.

Same for me. I was able to reproduce it with a Raspberry Pi running podman :
Raspberry Pi 5 Model B Rev 1.0
Linux raspberrypi 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux

But yes a cloud VM on aarch64 should work.

@lhotari
Copy link
Member

lhotari commented Oct 14, 2024

I tried to reproduce on GCP t2a-standard-1 / Ampere Altra Arm64 with Debian Bookworm and docker installed with instructions from https://docs.docker.com/engine/install/debian/. I couldn't reproduce the issue.

@lhotari
Copy link
Member

lhotari commented Oct 14, 2024

I tried to reproduce on GCP t2a-standard-1 / Ampere Altra Arm64 with Debian Bookworm and podman and couldn't reproduce the issue.

@lhotari
Copy link
Member

lhotari commented Oct 14, 2024

It didn't reproduce with RHEL 9 on GCP t2a-standard-1 / Ampere Altra Arm64
GCP doesn't have RHEL 8 image available for Arm64, so I used RHEL 9 Arm64 image.

[lari_hotari@instance-20241014-100511 ~]$ uname -a
Linux instance-20241014-100511 5.14.0-427.37.1.el9_4.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 17:15:09 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

used these commands

yum install -y podman tmux
tmux
# in one tmux window
podman run --rm -it docker.io/apachepulsar/pulsar:3.3.1 bin/pulsar standalone
# in another CTRL-B C
podman exec -it pulsar bin/pulsar-perf produce test

@ozangunalp Do you have any suggestions for reproducing on a cloud VM? Which commands should I use?

@lhotari lhotari added triage/lhotari/important lhotari's triaging label for important issues or PRs and removed release/blocker Indicate the PR or issue that should block the release until it gets resolved labels Oct 14, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 14, 2024
@lhotari
Copy link
Member

lhotari commented Jan 5, 2025

This is most likely resolved with #23762 and will be included in Pulsar 3.3.4 and Pulsar 4.0.2 releases.

@lhotari lhotari closed this as completed Jan 5, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
triage/lhotari/important lhotari's triaging label for important issues or PRs type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants