Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Frequent consumer disconnection with broker #1351

Open
7 tasks
techgeekengineer007 opened this issue Dec 3, 2024 · 0 comments
Open
7 tasks

Frequent consumer disconnection with broker #1351

techgeekengineer007 opened this issue Dec 3, 2024 · 0 comments

Comments

@techgeekengineer007
Copy link

techgeekengineer007 commented Dec 3, 2024

Description

I am experiencing persistent Kafka broker connectivity issues when using the Confluent Kafka Go client. The consumer frequently disconnects from brokers, as indicated by multiple disconnection and reconnection attempts in the log.

I am running with multiple consumer pods and cannot keep up live. Lags are increasing, but consumers are down.
Also, sometimes 2 -3 consumers keep up live for 4-5 hours on the event load and stop after that. Assume that I using almost 10 consumer pods to process those events. frequent disconnections with multiple consumer pods and the challenges with maintaining stability.
Your help is greatly appreciated.

How to reproduce

max.poll.interval.ms: 600000
session. timeout.ms: 60000

Error log (broker):
identical error(s) suppressed) (_TRANSPORT): identical to last error %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Disconnected (after 2507976ms in state UP, 1 identical error(s) suppressed) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state UP -> DOWN %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: Requesting metadata for 1/1 topics: broker down %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Request metadata for 1 topic(s): broker down %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state DOWN -> INIT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state TRY_CONNECT -> CONNECT %7|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/2: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP) (_TRANSPORT) %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state UP -> DOWN %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: Requesting metadata for 1/1 topics: broker down %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Request metadata for 1 topic(s): broker down %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state DOWN -> INIT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/2: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state TRY_CONNECT -> CONNECT {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Disconnected (after 2507976ms in state UP, 1 identical error(s) suppressed)\n"} {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: GroupCoordinator: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP)\n"} %7|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed) (_TRANSPORT): identical to last error %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state UP -> DOWN {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed)\n"} %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Selected for cluster connection: broker down (broker has 1 connection attempt(s)) {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: All broker connections are down: 4/4 brokers are down\n"} %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: Skipping metadata refresh of 1 topic(s): broker down: no usable brokers {"level":"info","caller":"/kafka.go:341","time":"2024-12-02T11:07:52Z","message":"Closing consumer"} %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state DOWN -> INIT %7|1733137672.496|SUBSCRIPTION|my-kafka-app-system#consumer-4| [thrd:main]: Group "group-my-activity": effective subscription list changed from 1 to 0 topic(s): %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Received CONNECT op %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state TRY_CONNECT -> CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Broker changed state TRY_CONNECT -> CONNECT {"level":"info","caller":"/kafka.go:360","time":"2024-12-02T11:07:52Z","message":"% EAGER rebalance: 2 partition(s) revoked: [my-activity[8]@unset my-activity[9]@unset]"} {"level":"info","caller":"/kafka.go:373","time":"2024-12-02T11:07:52Z","message":"% Committed offsets to Kafka: []"} %7|1733137672.497|NODENAME|my-kafka-app-system#consumer-4| [thrd:main]: GroupCoordinator/2: Broker nodename changed from "b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094" to "" %7|1733137672.497|NODEID|my-kafka-app-system#consumer-4| [thrd:main]: GroupCoordinator/2: Broker nodeid changed from 2 to -1 %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 15 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:app]: Terminating instance (destroy flags none (0x0)) %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Handle is terminating in state CONNECT: 11 refcnts (0x7f28580eb040), 4 toppar(s), 1 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 28 %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Connected to ipv4#xx.xxx.xx.xxx:9094 %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state CONNECT -> SSL_HANDSHAKE %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Destroy internal %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Removing all topics %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to GroupCoordinator %7|1733137672.497|TERM|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Received TERMINATE op in state INIT: 3 refcnts, 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 14 %7|1733137672.497|FAIL|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Client is terminating (after 9064894ms in state INIT) (_DESTROY) %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Broker changed state INIT -> DOWN %7|1733137672.497|BRKTERM|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: terminating: broker still has 3 refcnt(s), 0 buffer(s), 0 partition(s) %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Handle is terminating in state DOWN: 2 refcnts (0x7f28580e84d0), 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Broker changed state DOWN -> INIT %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Handle is terminating in state CONNECT: 2 refcnts (0x7f28580e7850), 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Connected to ipv4#xx.xxx.xx.xxx:9094 %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state CONNECT -> SSL_HANDSHAKE

Checklist

Please provide the following information:

  • confluent-kafka-go and librdkafka version (LibraryVersion(v2.3.0)):
  • Apache Kafka broker version: 3.5.1
  • Client configuration: ConfigMap{...}
  • Operating system:
  • Provide client logs (with "debug": ".." as necessary)
  • Provide broker log excerpts
  • Critical issue
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant