Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Cannot read from Kafka due to short poll timeout of consumer in KafkaIO #30870

Open
1 of 16 tasks
xianhualiu opened this issue Apr 5, 2024 · 2 comments
Open
1 of 16 tasks
Assignees

Comments

@xianhualiu
Copy link
Contributor

What happened?

The default Kafka consumer poll timeout is set to 1 second. It works fine when the the response can get messages from the kafka broker server within this 1 second, such as when client accesses broker within the same region. But if the responding time is more than 1 second, the consumer will not retrieve any messages. One customer reported that throughput of processed messages was extremely low in cross-region read since most of the time the responding time takes more than 1 second.

As a solution, the Kafka consumer polling timeout needs to be configurable, so customer can adjust it according to their needs.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@xianhualiu
Copy link
Contributor Author

.take-issue

@jbsabbagh
Copy link
Contributor

This also affects the Python SDK.

xianhualiu added a commit to xianhualiu/beam that referenced this issue Apr 9, 2024
xianhualiu added a commit to xianhualiu/beam that referenced this issue Apr 10, 2024
apache#30870: support consumer polling timeout in KafkaIO expansion service
xianhualiu added a commit to xianhualiu/beam that referenced this issue Apr 10, 2024
damccorm pushed a commit that referenced this issue Apr 16, 2024
…ce (#30915)

* [#30870]: support consumer polling timeout in KafkaIO expansion service

* fixed spotless complains

* fixed python format complains

* Update sdks/python/apache_beam/io/kafka.py

Co-authored-by: Jonathan Sabbagh <108473809+jbsabbagh@users.noreply.github.com>

* Update sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

Co-authored-by: Jonathan Sabbagh <108473809+jbsabbagh@users.noreply.github.com>

* Update sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

Co-authored-by: Jonathan Sabbagh <108473809+jbsabbagh@users.noreply.github.com>

* fixed formating issue

* fixed pylint and pydoc issues

* shorten the variable name

* fixed format and upgrade test

* fixed test

* fixed test

---------

Co-authored-by: Jonathan Sabbagh <108473809+jbsabbagh@users.noreply.github.com>
Abacn added a commit to Abacn/beam that referenced this issue Apr 16, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants