Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Flaky-test: ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild #23389

Closed
1 of 2 tasks
lhotari opened this issue Oct 2, 2024 · 4 comments · Fixed by #23852
Closed
1 of 2 tasks

Flaky-test: ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild #23389

lhotari opened this issue Oct 2, 2024 · 4 comments · Fixed by #23852

Comments

@lhotari
Copy link
Member

lhotari commented Oct 2, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Example failure

https://github.com/apache/pulsar/actions/runs/11125404278/job/30955253737?pr=23327#step:11:1680

Exception stacktrace

  Error:  Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 74.942 s <<< FAILURE! - in org.apache.pulsar.broker.service.ZkSessionExpireTest
  Error:  org.apache.pulsar.broker.service.ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild[false, class org.apache.pulsar.broker.service.NetworkErrorTestBase$PreferBrokerModularLoadManager](4)  Time elapsed: 31.007 s  <<< FAILURE!
  org.awaitility.core.ConditionTimeoutException: Assertion condition expected [2] but found [1] within 10 seconds.
  	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
  	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
  	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
  	at org.apache.pulsar.broker.service.ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild(ZkSessionExpireTest.java:154)
  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  	at java.base/java.lang.Thread.run(Thread.java:1583)
  Caused by: java.lang.AssertionError: expected [2] but found [1]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertEqualsImpl(Assert.java:149)
  	at org.testng.Assert.assertEquals(Assert.java:131)
  	at org.testng.Assert.assertEquals(Assert.java:1418)
  	at org.testng.Assert.assertEquals(Assert.java:1382)
  	at org.testng.Assert.assertEquals(Assert.java:1428)
  	at org.apache.pulsar.broker.service.ZkSessionExpireTest.lambda$testTopicUnloadAfterSessionRebuild$4(ZkSessionExpireTest.java:155)
  	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
  	... 4 more

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari
Copy link
Member Author

lhotari commented Jan 3, 2025

This test is very flaky at the moment. Failure in branch-4.0 build: https://github.com/apache/pulsar/actions/runs/12600662668/job/35120529770#step:11:1734

@lhotari
Copy link
Member Author

lhotari commented Jan 3, 2025

Logs uploaded to https://gist.github.com/lhotari/8eb64203e95a352631957199b3d19420

In https://gist.githubusercontent.com/lhotari/8eb64203e95a352631957199b3d19420/raw/7e9a48fb45e24d1ce3b91dc9ab944bf80841ccd1/org.apache.pulsar.broker.service.ZkSessionExpireTest-output.txt
There are log lines such as these ones before the test failure:

2025-01-03T16:47:20,293 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,394 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,494 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,595 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,695 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
!!!!!!!!! FAILURE-- [TestClass name=class org.apache.pulsar.broker.service.ZkSessionExpireTest].testTopicUnloadAfterSessionRebuild([true, class org.apache.pulsar.broker.service.NetworkErrorTestBase$PreferBrokerModularLoadManager])-------
org.awaitility.core.ConditionTimeoutException: Assertion condition expected [true] but found [false] within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
	at org.apache.pulsar.broker.service.ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild(ZkSessionExpireTest.java:161)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.AssertionError: expected [true] but found [false]
	at org.testng.Assert.fail(Assert.java:110)
	at org.testng.Assert.failNotEquals(Assert.java:1577)
	at org.testng.Assert.assertTrue(Assert.java:56)
	at org.testng.Assert.assertTrue(Assert.java:66)
	at org.apache.pulsar.broker.service.ZkSessionExpireTest.lambda$testTopicUnloadAfterSessionRebuild$5(ZkSessionExpireTest.java:163)
	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
	... 4 more

@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2025

@poorbarcode Do you have a chance to fix the flaky test ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild ?

@lhotari
Copy link
Member Author

lhotari commented Jan 13, 2025

@poorbarcode The test usually passes when running on MacOS locally.

mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker test -Dtest=ZkSessionExpireTest -DexcludedGroups=

It seems that the flakiness comes into play when running with constrained CPU resources like in the CI. I have a solution to run tests in docker with shell functions from https://github.com/lhotari/pulsar-contributor-toolbox .

ptbx_run_test_in_docker -pl pulsar-broker -Dtest=ZkSessionExpireTest -DexcludedGroups=

ptbx_run_test_in_docker will setup a docker image, install java and tooling and then run the test in docker with --cpus=2 --memory=6g to limit resources. This usually triggers the flakiness in many different ways.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant