Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incorrect unknown partition error when assigning newly-created partitions #2915

Closed
benesch opened this issue Jun 3, 2020 · 0 comments · Fixed by #2916
Closed

Incorrect unknown partition error when assigning newly-created partitions #2915

benesch opened this issue Jun 3, 2020 · 0 comments · Fixed by #2916
Milestone

Comments

@benesch
Copy link
Contributor

benesch commented Jun 3, 2020

Consider the following sequence of events for a Kafka consumer (not using consumer groups):

  1. Assign consumer partition 0.
  2. Create partition 1.
  3. Assign consumer partition 1.
  4. Refresh metadata.

The metadata refresh will incorrectly generate an error that partition 0 is unknown. The issue seems to be a glitch into how the desired partition list (rkt->rkt_deps) is maintained.

This reproduces on the latest librdkafka etc. I'm going to submit a PR with a test case and a fix momentarily, but wanted an issue number to put in the test.

benesch added a commit to benesch/librdkafka that referenced this issue Jun 3, 2020
rd_kafka_cgrp_assign calls rd_kafka_toppar_desired_add0 rather than its
wrapper rd_kafka_toppar_desired_add. The "add" wrapper preserves the
invariant that a known topic should never get added to the desired
partitions queue, while the "add0" function does not.

Maintaining this invariant is important for
rd_kafka_topic_partition_cnt_update, which assumes that a toppar is in
either the list of known partitions or the list of desired partitions,
but not both. Violating this invariant results in the situation
described in confluentinc#2915, where updating assignments can trigger incorrect
"unknown partition" errors.

This patch rearranges rd_kafka_toppar_desired_add/add0 so that add0, in
addition to add, will avoid adding known partitions to the desired
partition list. The enclosed test correctly fails if run against the
current master (for the reasons described above).

Fix confluentinc#2915.
@edenhill edenhill added this to the v1.5.0 milestone Jul 2, 2020
edenhill pushed a commit that referenced this issue Jul 2, 2020
rd_kafka_cgrp_assign calls rd_kafka_toppar_desired_add0 rather than its
wrapper rd_kafka_toppar_desired_add. The "add" wrapper preserves the
invariant that a known topic should never get added to the desired
partitions queue, while the "add0" function does not.

Maintaining this invariant is important for
rd_kafka_topic_partition_cnt_update, which assumes that a toppar is in
either the list of known partitions or the list of desired partitions,
but not both. Violating this invariant results in the situation
described in #2915, where updating assignments can trigger incorrect
"unknown partition" errors.

This patch rearranges rd_kafka_toppar_desired_add/add0 so that add0, in
addition to add, will avoid adding known partitions to the desired
partition list. The enclosed test correctly fails if run against the
current master (for the reasons described above).

Fix #2915.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants