Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Flaky-test: ExtensibleLoadManagerTest.testIsolationPolicy #20608

Open
1 of 2 tasks
lhotari opened this issue Jun 19, 2023 · 10 comments · Fixed by #20609
Open
1 of 2 tasks

Flaky-test: ExtensibleLoadManagerTest.testIsolationPolicy #20608

lhotari opened this issue Jun 19, 2023 · 10 comments · Fixed by #20609

Comments

@lhotari
Copy link
Member

lhotari commented Jun 19, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Example failure

https://github.com/apache/pulsar/actions/runs/5309497330/jobs/9610705022?pr=20603#step:12:23713

Exception stacktrace

  Error:  Tests run: 20, Failures: 1, Errors: 0, Skipped: 16, Time elapsed: 545.625 s <<< FAILURE! - in TestSuite
  Error:  org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy  Time elapsed: 29.596 s  <<< FAILURE!
  java.lang.AssertionError: expected [true] but found [false]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertTrue(Assert.java:56)
  	at org.testng.Assert.assertTrue(Assert.java:66)
  	at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:349)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:833)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari changed the title Flaky-test: test_class.test_method Flaky-test: ExtensibleLoadManagerTest.testIsolationPolicy Jun 19, 2023
@lhotari
Copy link
Member Author

lhotari commented Jun 19, 2023

Error in logs:

  2023-06-19T08:55:18,683 - ERROR - [TestNG-method=testIsolationPolicy-1:ExtensibleLoadManagerTest@348] - Failed to lookup topic: 
  org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: 
   --- An unexpected error occurred in the server ---
  
  Message: Failed to select the new owner broker for bundle: my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000
  
  Stacktrace:
  
  java.lang.IllegalStateException: Failed to select the new owner broker for bundle: my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getOwnerAsync$8(ExtensibleLoadManagerImpl.java:400)
  	at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
  	at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getOwnerAsync$9(ExtensibleLoadManagerImpl.java:394)
  	at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
  	at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.getOwnerAsync(ExtensibleLoadManagerImpl.java:384)
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$assign$7(ExtensibleLoadManagerImpl.java:376)
  	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.put(ConcurrentOpenHashMap.java:409)
  	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.computeIfAbsent(ConcurrentOpenHashMap.java:243)
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.dedupeLookupRequest(ExtensibleLoadManagerImpl.java:457)
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.assign(ExtensibleLoadManagerImpl.java:370)
  	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerWrapper.findBrokerServiceUrl(ExtensibleLoadManagerWrapper.java:66)
  	at org.apache.pulsar.broker.namespace.NamespaceService.lambda$getBrokerServiceUrlAsync$0(NamespaceService.java:198)
  	at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
  	at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
  	at org.apache.pulsar.broker.namespace.NamespaceService.lambda$getBrokerServiceUrlAsync$1(NamespaceService.java:191)
  	at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
  	at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
  	at org.apache.pulsar.broker.namespace.NamespaceService.getBrokerServiceUrlAsync(NamespaceService.java:189)
  	at org.apache.pulsar.broker.lookup.TopicLookupBase.lambda$internalLookupTopicAsync$6(TopicLookupBase.java:85)
  	at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
  	at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
  	at org.apache.pulsar.broker.lookup.TopicLookupBase.internalLookupTopicAsync(TopicLookupBase.java:80)
  	at org.apache.pulsar.broker.lookup.v2.TopicLookup.lookupTopicAsync(TopicLookup.java:69)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
  	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
  	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
  	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
  	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
  	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
  	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
  	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
  	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
  	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
  	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
  	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
  	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
  	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
  	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
  	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
  	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
  	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
  	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
  	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
  	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
  	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
  	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
  	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
  	at org.eclipse.jetty.servlets.QoSFilter.doFilter(QoSFilter.java:202)
  	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
  	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
  	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
  	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
  	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
  	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
  	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
  	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
  	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
  	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
  	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
  	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
  	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
  	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
  	at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
  	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
  	at org.eclipse.jetty.server.Server.handle(Server.java:516)
  	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
  	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
  	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
  	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
  	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
  	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
  	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
  	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  	at java.base/java.lang.Thread.run(Thread.java:833)
  
  	at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.finishUpdate(AsyncHttpClientHandler.java:241) ~[async-http-client-2.12.1.jar:?]
  	at org.asynchttpclient.netty.handler.HttpHandler.handleChunk(HttpHandler.java:114) ~[async-http-client-2.12.1.jar:?]
  	at org.asynchttpclient.netty.handler.HttpHandler.handleRead(HttpHandler.java:143) ~[async-http-client-2.12.1.jar:?]
  	at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.channelRead(AsyncHttpClientHandler.java:78) ~[async-http-client-2.12.1.jar:?]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.93.Final.jar:4.1.93.Final]
  	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.93.Final.jar:4.1.93.Final]
  	... 1 more
  java.lang.AssertionError: expected [true] but found [false]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertTrue(Assert.java:56)
  	at org.testng.Assert.assertTrue(Assert.java:66)
  	at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:349)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:833)

@lhotari
Copy link
Member Author

lhotari commented Jun 28, 2023

This test now fails in this location:

  2023-06-28T16:21:50,038 - INFO  - [main:TestRetrySupport@53] - Previous test run has failed before ExtensibleLoadManagerTest.testIsolationPolicy, failedSetupNumber=1. Running cleanup and setup.
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.fail(Assert.java:115)
  	at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:351)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:833)

on this line:

One possible reason for this failure is the problem that metadata changes are eventually consistent in Pulsar. The lack of strong consistency causes flakiness which is also a production code problem. A good example is #12555 (comment) . The metadata consistency problems are also covered in the blog post https://codingthestreams.com/pulsar/2022/10/18/view-of-pulsar-metadata-store.html .

@lhotari lhotari reopened this Jun 28, 2023
@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jul 29, 2023
@Technoboy-
Copy link
Contributor

@lhotari
Copy link
Member Author

lhotari commented Dec 19, 2023

Flaky-test: org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy
Number of failures: 2

org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest is flaky. The testIsolationPolicy test method fails sporadically.

java.lang.AssertionError:

Expecting actual:
  "
 --- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2023-12-17T12:45:31.9552882Z

Full exception stacktrace
java.lang.AssertionError:

Expecting actual:
"
--- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:269)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$5(ZKMetadataStore.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:840)
"
to contain at least one of the following elements:

java.lang.AssertionError:

Expecting actual:
  "
 --- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2023-12-16T06:33:50.6223790Z

Full exception stacktrace
java.lang.AssertionError:

Expecting actual:
"
--- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:269)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$5(ZKMetadataStore.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:1583)
"
to contain at least one of the following elements:

@lhotari
Copy link
Member Author

lhotari commented Dec 19, 2023

uploaded logs to https://gist.githubusercontent.com/lhotari/4ac3082f8ab4806a138095599bb6de7d/raw/e8604ae47515c97c0ff25f6849051f515d6c73b3/ExtensibleLoadManagerTest_testIsolationPolicy_logs.txt

The log file is from the build's Integration-LOADBALANCE-surefire-reports.zip file and extracted with this command:

xmlstarlet sel -t -m '//testcase[@name="testIsolationPolicy"]' -v @classname -o '.' -v @name -n -v failure -n -v system-out -n tests/integration/target/surefire-reports/TEST-TestSuite.xml > ExtensibleLoadManagerTest_testIsolationPolicy_logs.txt

(brew install xmlstarlet to get xmlstarlet)

@lhotari
Copy link
Member Author

lhotari commented Dec 19, 2023

Error logs contains a lot of errors like this:

2023-12-19T12:49:41,746+0000 [pulsar-load-manager-1-1] ERROR org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl - The broker:pulsar-broker-1:8080 failed to change the role. Retrying 5 th ...
org.apache.pulsar.broker.loadbalance.extensions.store.LoadDataStoreException: org.apache.pulsar.client.api.PulsarClientException$TopicDoesNotExistException: {"errorMsg":"Topic non-persistent://pulsar/system/loadbalancer-top-bundles-load-data does not exist","reqId":4571777132239267974, "remote":"pulsar-broker-1/172.18.0.7:6650", "local":"/172.18.0.7:46828"}
        at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.startTableView(TableViewLoadDataStoreImpl.java:109) ~[org.apache.pulsar-pulsar-broker-3.2.0-SNAPSHOT.jar:3.2.0-SNAPS

And it seems that it never recovers. Is this expected?

@lhotari
Copy link
Member Author

lhotari commented Dec 19, 2023

I wonder if this is related to apache/pulsar-client-cpp#86 . There have been issues in the past where a topic that has been created on one broker won't be immediately visible on another broker.

@heesung-sn
Copy link
Contributor

Raised a fix : #21764

@lhotari
Copy link
Member Author

lhotari commented Feb 26, 2024

Recent failure:

  Error:  Tests run: 20, Failures: 1, Errors: 0, Skipped: 16, Time elapsed: 595.011 s <<< FAILURE! - in TestSuite
  Error:  org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy  Time elapsed: 47.812 s  <<< FAILURE!
  org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest expected [2] but found [3] within 30 seconds.
  	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
  	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
  	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
  	at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:374)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:840)
  Caused by: java.lang.AssertionError: expected [2] but found [3]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertEqualsImpl(Assert.java:149)
  	at org.testng.Assert.assertEquals(Assert.java:131)
  	at org.testng.Assert.assertEquals(Assert.java:1418)
  	at org.testng.Assert.assertEquals(Assert.java:1382)
  	at org.testng.Assert.assertEquals(Assert.java:1428)
  	at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.lambda$testIsolationPolicy$7(ExtensibleLoadManagerTest.java:377)
  	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
  	... 4 more

in https://github.com/apache/pulsar/actions/runs/8046911413/job/21976191499?pr=22122#step:12:22339

@lhotari lhotari reopened this Feb 26, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants