Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[fix][test] Disable ExtensibleLoadManagerImpl in ReplicatorGlobalNSTest #22590

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Apr 25, 2024

Motivation

It causes a OOME issue in Pulsar CI, see #22588

Modifications

Disable ExtensibleLoadManagerImpl in ReplicatorGlobalNSTest

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 3.3.0 milestone Apr 25, 2024
@lhotari lhotari requested a review from heesung-sn April 25, 2024 20:39
@lhotari lhotari self-assigned this Apr 25, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 25, 2024
@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

@heesung-sn There's a chance that #22589 helps. It looks like cleanup for tests gets stuck in deleting the __change_events topic for namespaces.

@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

Evidence in a heap dump ("Unit-BROKER_CLIENT_IMPL-dumps" in https://github.com/apache/pulsar/actions/runs/8835483969/attempts/1):

select this['arg$2.completeTopicName'], count(*) from "org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$2276+0x00007f588cc63970"  group by 1
EXPR$0                                                      |  EXPR$1
----------------------------------------------------------------------
persistent://pulsar/global/removeClusterTest/__change_events| 209,380
----------------------------------------------------------------------
image

It looks like there are 209380 calls to remove persistent://pulsar/global/removeClusterTest/__change_events which result in org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException.

@heesung-sn
Copy link
Contributor

persistent://pulsar/global/removeClusterTest/__change_event

I am a bit surprised that we have so many events on this topic. (Are there any loop that pushes too many events there..?)

@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

persistent://pulsar/global/removeClusterTest/__change_event

I am a bit surprised that we have so many events on this topic. (Are there any loop that pushes too many events there..?)

Yes, one assumption is that readers/writers in TopicPoliciesService are causing the trouble. That's why I did #22589.
I'll have to try running ReplicatorGlobalNSTest locally to see if the problem reproduces. I didn't try that yet since I've only been analysing the heap dumps.

@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

This reproduces a problem

mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker "-Dtest=org.apache.pulsar.broker.service.ReplicatorGlobalNSTest#testRemoveLocalClusterOnGlobalNamespace"

@lhotari lhotari marked this pull request as draft April 25, 2024 21:13
@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

The problem happens for both load balancer implementations.

@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

2024-04-26T00:24:03,544 - ERROR - [broker-topic-workers-OrderedExecutor-6-0:PersistentTopic] - [persistent://pulsar/global/removeClusterTest/__change_events] Error deleting topic
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: java.util.concurrent.CompletionException: org.apach
e.bookkeeper.mledger.ManagedLedgerException$CursorAlreadyClosedException: Cursor was already closed
        at org.apache.pulsar.broker.service.persistent.PersistentTopic$6.deleteLedgerFailed(PersistentTopic.java:1496) ~[classes/:?]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.lambda$asyncDelete$33(ManagedLedgerImpl.java:2950) ~[managed-ledger-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325) ~[?:?]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncDelete(ManagedLedgerImpl.java:2947) ~[managed-ledger-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
        at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$delete$40(PersistentTopic.java:1468) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325) ~[?:?]
        at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$delete$41(PersistentTopic.java:1462) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325) ~[?:?]
        at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$delete$42(PersistentTopic.java:1453) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$delete$34(PersistentTopic.java:1431) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:787) [?:?]
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) [?:?]
        at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) [bookkeeper-common-4.17.0.jar:4.17.0]
        at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107) [bookkeeper-common-4.17.0.jar:4.17.0]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.108.Final.jar:4.1.108.Final]
        at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException: java.util.concurrent.CompletionException: org.apache.bookkeeper.mledger.ManagedLedgerException$CursorAlreadyClosedExce
ption: Cursor was already closed
Caused by: java.util.concurrent.CompletionException: org.apache.bookkeeper.mledger.ManagedLedgerException$CursorAlreadyClosedException: Cursor was already closed
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1527) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2419) ~[?:?]
        at org.apache.pulsar.common.util.FutureUtil.waitForAll(FutureUtil.java:56) ~[pulsar-common-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncTruncate(ManagedLedgerImpl.java:4341) ~[managed-ledger-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncDelete(ManagedLedgerImpl.java:2946) ~[managed-ledger-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
        ... 19 more
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException$CursorAlreadyClosedException: Cursor was already closed

@lhotari lhotari closed this Apr 25, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test type/flaky-tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants