Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[fix][client] TransactionCoordinatorClient support retry #23081

Merged
merged 4 commits into from
Jul 30, 2024

Conversation

chenhongSZ
Copy link
Contributor

@chenhongSZ chenhongSZ commented Jul 26, 2024

Motivation

we have two brokerUrls(one is dead) in PulsarClient serviceUrl, it is working well without enabling transactions.

            # broker test.pulsar.com:6651 is dead
            PulsarClient client = PulsarClient.builder()
                    //.enableTransaction(true)
                    .serviceUrl("pulsar://test.pulsar.com:6650,test.pulsar.com:6651")
                    .build();

If enabling transactions

            # broker test.pulsar.com:6651 is dead
            PulsarClient client = PulsarClient.builder()
                    .enableTransaction(true)
                    .serviceUrl("pulsar://test.pulsar.com:6650,test.pulsar.com:6651")
                    .build();

the following error occurs and the PulsarClient init fails, It will not retry the other brokerUrl

2024-07-27T00:52:27,787 - ERROR - [pulsar-client-io-1-1:TransactionMetaStoreHandler@120] - Transaction meta handler with transaction coordinator id 14 connection failed.
org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.pulsar.com/127.0.0.1:6651
	at org.apache.pulsar.client.impl.ConnectionPool.lambda$createConnection$14(ConnectionPool.java:311) ~[classes/:?]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.pulsar.com/127.0.0.1:6651
	at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.pulsar.common.util.netty.ChannelFutures.lambda$toCompletableFuture$0(ChannelFutures.java:58) ~[classes/:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:326) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:342) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.pulsar.com/127.0.0.1:6651
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	... 4 more

The root cause is that the TransactionCoordinatorClient doesn't support retry.

Modifications

support retry in TransactionCoordinatorClient.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 26, 2024
@lhotari
Copy link
Member

lhotari commented Jul 29, 2024

Great work @chenhongSZ! Would it be possible to add a test case?

@lhotari lhotari added this to the 3.4.0 milestone Jul 29, 2024
@chenhongSZ
Copy link
Contributor Author

Great work @chenhongSZ! Would it be possible to add a test case?

This pr contains correlation test cases in TransactionCoordinatorClientTest.java

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@aloyszhang aloyszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenhongSZ
Copy link
Contributor Author

/pulsarbot run-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 31.25000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 73.41%. Comparing base (bbc6224) to head (c6b3c47).
Report is 603 commits behind head on master.

Files with missing lines Patch % Lines
...ulsar/client/impl/TransactionMetaStoreHandler.java 21.42% 8 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23081      +/-   ##
============================================
- Coverage     73.57%   73.41%   -0.16%     
- Complexity    32624    33206     +582     
============================================
  Files          1877     1917      +40     
  Lines        139502   144077    +4575     
  Branches      15299    15744     +445     
============================================
+ Hits         102638   105774    +3136     
- Misses        28908    30179    +1271     
- Partials       7956     8124     +168     
Flag Coverage Δ
inttests 27.44% <25.00%> (+2.86%) ⬆️
systests 24.78% <0.00%> (+0.45%) ⬆️
unittests 72.47% <31.25%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../transaction/TransactionCoordinatorClientImpl.java 66.07% <100.00%> (+0.30%) ⬆️
...ulsar/client/impl/TransactionMetaStoreHandler.java 67.04% <21.42%> (-0.86%) ⬇️

... and 510 files with indirect coverage changes

@aloyszhang aloyszhang merged commit 6bbaec1 into apache:master Jul 30, 2024
61 checks passed
lhotari pushed a commit that referenced this pull request Aug 6, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 8, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 12, 2024
lhotari pushed a commit that referenced this pull request Aug 16, 2024
@chenhongSZ chenhongSZ deleted the tc-support-retry branch September 13, 2024 05:00
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants