[FLINK-34470][Connectors/Kafka] Fix indefinite blocking by adjusting stopping condition in split reader #100

dongwoo6kim · 2024-04-29T03:15:03Z

Problem

When using the flink kafka connector in batch scenarios, consuming transactional messages can cause indefinite hanging.
This issue can be easily reproduced with following steps.

Produce transactional messages and commit them.
Configure scan.bounded.mode to latest-offset and run consumer using flink kafka connector

Cause

The previous stopping condition in the KafkaPartitionSplitReader compared the offset of the last record with the stoppingOffset. This approach works for streaming use cases and batch processing of non-transactional messages. However, in scenarios involving transactional messages, this is insufficient.
Control messages, which are not visible to clients, can occupy the entire range between the last record's offset and the stoppingOffset which leads to indefinite blocking.

Workaround

I've modified the stopping condition to use consumer.position(tp), which effectively skips any control messages present in the current poll, pointing directly to the next record's offset.
To handle edge cases, particularly when properties.max.poll.records is set to 1, I've adjusted the fetch method to always check all assigned partitions, even if no records are returned in a poll.

Edge case example

Consider partition 0, where offsets 13 and 14 are valid records and 15 is a control record. If stoppingOffset is set to 15 for partition 0and properties.max.poll.records is configured to 1, checking only partitions that return records would miss offset 15. By consistently reviewing all assigned partitions, the consumer’s position jumps control record in the subsequent poll, allowing the system to escape.

Discussion

To address the metric issue in FLINK-33484, I think we need to make wrapper class of ConsumerRecord for example ConsumerRecordWithOffsetJump.

public ConsumerRecordWithOffsetJump(ConsumerRecord<K, V> record, long offsetJump) {
        this.record = record;
        this.offsetJump = offsetJump;
    }

And we may need new KafkaPartitionSplitReader that implements
SplitReader<ConsumerRecordWithOffsetJump<byte[], byte[]>, KafkaPartitionSplit>.
So when record is emitted it should set current offset not just record.offset()+1 but
record.offset() + record.jumpValue in here.
jumpValue is typically 1, except for the last record of each poll where it's calculated as
consumer.position() - lastRecord.offset().
If this sounds good to everyone, I'm happy to work on this.

boring-cyborg · 2024-04-29T03:15:07Z

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

LinMingQiang · 2024-05-15T02:14:00Z

its work , i had try.

LinMingQiang · 2024-05-15T02:52:12Z

.../src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java

+                        stoppingOffset);
+                recordsBySplits.setPartitionStoppingOffset(tp, stoppingOffset);
+                finishSplitAtRecord(
+                        tp, stoppingOffset, consumerPosition, finishedPartitions, recordsBySplits);
            }
            // Track this partition's record lag if it never appears before
            kafkaSourceReaderMetrics.maybeAddRecordsLagMetric(consumer, tp);


consumerRecords.partitions().forEach(trackTp -> { // Track this partition's record lag if it never appears before kafkaSourceReaderMetrics.maybeAddRecordsLagMetric(consumer, trackTp); });

we do not need to track tp when consumerRecords is empty.

@LinMingQiang, thanks for the review.
I've changed to track tp, only when there is record for that tp.

morazow

Hey @dongwoo6kim, overall seems okay from my side. But would be possible to add a integration-test ITCase for this case

dongwoo6kim · 2024-06-01T16:00:46Z

Hello @morazow I've added ITCase for this case.
This case fails for current main branch due to timeout and works okay with fixed code.

morazow · 2024-06-10T21:15:19Z

Thanks @dongwoo6kim ,

Tests looks good from my side 👍

(Recently I faced similar issue which maybe related, when running batch mode with setting startingOffsets. The change should solve that issue. But we may create issue for it)

dongwoo6kim · 2024-06-13T01:55:24Z

Thanks for confirming @morazow,
Please feel free to provide any additional advice before merging this fix.
It would be also helpful if you could elaborate more on the issue you mentioned and consider adding relevant test code for it.

morazow · 2024-06-26T11:33:14Z

Hey @dongwoo6kim, we created another issue for it, the solution seems to be similar but let's discuss it again once this PR is merged.

dongwoo6kim · 2024-06-30T13:41:05Z

Hello @morazow, I've added test code for the mentioned issue, please take a look.
Test passes with this fixed code and on the latest main branch, it timeouts due to indefinite blocking.

morazow · 2024-07-08T11:57:14Z

Thanks @dongwoo6kim, Looks good!

AHeise

@dongwoo6kim thank you very much for your contribution. I have a couple of remarks.

If you don't have time to fix it, I can also take over because we also hitting that on production.

.../src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java

...connector-kafka/src/test/java/org/apache/flink/connector/kafka/source/KafkaSourceITCase.java

AHeise · 2024-09-13T12:09:50Z

...connector-kafka/src/test/java/org/apache/flink/connector/kafka/source/KafkaSourceITCase.java

+            List<ProducerRecord<String, Integer>> records =
+                    KafkaSourceTestEnv.getRecordsForTopic(transactionalTopic);
+            // Prepare records for executeAndVerify method
+            records.removeIf(record -> record.partition() > record.value());


Why is this necessary?

This is preprocessing step before using executeAndVerify method to verify test result.

This method expects records from partition P should be an integer sequence from P to NUM_RECORDS_PER_PARTITION. So I deleted records where the value is less than the partition number.

Similar approach to here

Could you change the comment to reflect how the data looks afterwards? I'm assuming this will preserve one record per partition?
I'm also curios why we need a specific data layout for this test to work? Or rephrased: what would happen if we retain all records originally generated? Wouldn't the test still assert similar things?

I've made two changes.

Replaced records.removeIf(record -> record.partition() > record.value()) with KafkaSourceTestEnv.setupEarliestOffsets(transactionalTopic), as it serves a similar purpose.

Added a comment explaining how the data looks after the setup.

I'm assuming this will preserve one record per partition?

After data modification, each partition p will contain records from p to NUM_RECORDS_PER_PARTITION (which is 10). For example, partition 1 has records 1 to 10, and partition 5 has records 5 to 10.

what would happen if we retain all records originally generated

If we retain all records we need to make new assertion logic for the generated records.
The main purpose of this data modification setup is to reuse executeAndVerify method.
When you look at here executeAndVerify method expects the input data to be modified like this way.

I intended to reuse existing test util functions and follow the test code convention but if you think it is causing unnecessary confusion I can change the test code to have custom assertion logic.

If it's needed for verification, then all good. We should probably revise the setup then at a later point.

...connector-kafka/src/test/java/org/apache/flink/connector/kafka/source/KafkaSourceITCase.java

...connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaTestBase.java

...-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/table/KafkaTableITCase.java

dongwoo6kim · 2024-09-13T16:38:00Z

@AHeise thanks for the feedback!
I've addressed your comments and applied the suggested changes. When you have a moment, please take a look. Thanks

AHeise

Thank you very much for the swift response and action. Changes look good to me. I will approve and merge once CI passes.

My only concern is that I don't fully understand why we need to modify the data in the test case. Shouldn't a transaction marker always be at the end? What did I miss?

Also if possible could you squash your commits and extend the commit message to include some of the information of the PR description? Basically state the problem and the briefly summarize the solution. Your commit message and PR title is already very descriptive, so it's not necessary for merging and it would just be a nice-to-have.

AHeise · 2024-09-16T06:43:30Z

...connector-kafka/src/test/java/org/apache/flink/connector/kafka/source/KafkaSourceITCase.java

+            List<ProducerRecord<String, Integer>> records =
+                    KafkaSourceTestEnv.getRecordsForTopic(transactionalTopic);
+            // Prepare records for executeAndVerify method
+            records.removeIf(record -> record.partition() > record.value());


Could you change the comment to reflect how the data looks afterwards? I'm assuming this will preserve one record per partition?
I'm also curios why we need a specific data layout for this test to work? Or rephrased: what would happen if we retain all records originally generated? Wouldn't the test still assert similar things?

AHeise · 2024-09-16T07:12:57Z

Seems like the deleted arch unit rule was still needed. What was your intent when you deleted it?

…stopping condition on split reader Problem: In batch mode, flink kafka connector could hang when consuming transactional messages or reading from deleted records. Solution: Use consumer.position() instead of lastRecord's offset to skip control and deleted messages, preventing the hang.

dongwoo6kim · 2024-09-17T04:02:22Z

Seems like the deleted arch unit rule was still needed. What was your intent when you deleted it?

It was automatically deleted after running mvn clean verify locally. I manually rolled back the archunit changes.

dongwoo6kim · 2024-09-17T04:04:24Z

@AHeise Thanks for the feedback. I've left some comments and made updates. Please have a look.

AHeise · 2024-09-17T09:29:00Z

Changes all LGTM. I'm running CI and merge when it's green.
Thank you very much for this much needed fix!

boring-cyborg · 2024-09-17T13:34:12Z

Awesome work, congrats on your first merged pull request!

boring-cyborg bot added the component=Connectors/Kafka label Apr 29, 2024

LinMingQiang reviewed May 15, 2024

View reviewed changes

dongwoo6kim force-pushed the FLINK-34470 branch 2 times, most recently from e9bcea2 to 3e07d38 Compare May 15, 2024 06:39

morazow approved these changes May 22, 2024

View reviewed changes

dongwoo6kim force-pushed the FLINK-34470 branch from 3e07d38 to 078fa8d Compare June 1, 2024 15:58

morazow approved these changes Jun 10, 2024

View reviewed changes

dongwoo6kim force-pushed the FLINK-34470 branch from 078fa8d to f3c9666 Compare June 30, 2024 13:32

dongwoo6kim force-pushed the FLINK-34470 branch from f3c9666 to bfcca76 Compare July 12, 2024 14:12

AHeise reviewed Sep 13, 2024

View reviewed changes

dongwoo6kim force-pushed the FLINK-34470 branch from bfcca76 to 2134218 Compare September 13, 2024 16:30

AHeise reviewed Sep 16, 2024

View reviewed changes

AHeise self-assigned this Sep 16, 2024

dongwoo6kim force-pushed the FLINK-34470 branch from 2134218 to 9a48edc Compare September 17, 2024 03:09

dongwoo6kim force-pushed the FLINK-34470 branch from 9a48edc to 1605d51 Compare September 17, 2024 03:52

AHeise merged commit 122a743 into apache:main Sep 17, 2024
13 checks passed

AHeise mentioned this pull request Dec 16, 2024

[FLINK-36906] Optimize the logic for determining if a split is finished #141

Open

xiaochen-zhou mentioned this pull request Dec 17, 2024

[Fix][Connector-V2] Fix indefinite blocking by adjusting stopping condition in kafka split reader apache/seatunnel#8317

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-34470][Connectors/Kafka] Fix indefinite blocking by adjusting stopping condition in split reader #100

[FLINK-34470][Connectors/Kafka] Fix indefinite blocking by adjusting stopping condition in split reader #100

dongwoo6kim commented Apr 29, 2024 •

edited

Loading

boring-cyborg bot commented Apr 29, 2024

LinMingQiang commented May 15, 2024

LinMingQiang May 15, 2024

LinMingQiang May 15, 2024

dongwoo6kim May 15, 2024 •

edited

Loading

morazow left a comment

dongwoo6kim commented Jun 1, 2024

morazow commented Jun 10, 2024

dongwoo6kim commented Jun 13, 2024

morazow commented Jun 26, 2024

dongwoo6kim commented Jun 30, 2024

morazow commented Jul 8, 2024

AHeise left a comment

AHeise Sep 13, 2024

dongwoo6kim Sep 13, 2024

AHeise Sep 16, 2024

dongwoo6kim Sep 17, 2024 •

edited

Loading

AHeise Sep 17, 2024

dongwoo6kim commented Sep 13, 2024 •

edited

Loading

AHeise left a comment

AHeise Sep 16, 2024

AHeise commented Sep 16, 2024

dongwoo6kim commented Sep 17, 2024

dongwoo6kim commented Sep 17, 2024

AHeise commented Sep 17, 2024

boring-cyborg bot commented Sep 17, 2024

[FLINK-34470][Connectors/Kafka] Fix indefinite blocking by adjusting stopping condition in split reader #100

[FLINK-34470][Connectors/Kafka] Fix indefinite blocking by adjusting stopping condition in split reader #100

Conversation

dongwoo6kim commented Apr 29, 2024 • edited Loading

Problem

Cause

Workaround

Edge case example

Discussion

boring-cyborg bot commented Apr 29, 2024

LinMingQiang commented May 15, 2024

LinMingQiang May 15, 2024

Choose a reason for hiding this comment

LinMingQiang May 15, 2024

Choose a reason for hiding this comment

dongwoo6kim May 15, 2024 • edited Loading

Choose a reason for hiding this comment

morazow left a comment

Choose a reason for hiding this comment

dongwoo6kim commented Jun 1, 2024

morazow commented Jun 10, 2024

dongwoo6kim commented Jun 13, 2024

morazow commented Jun 26, 2024

dongwoo6kim commented Jun 30, 2024

morazow commented Jul 8, 2024

AHeise left a comment

Choose a reason for hiding this comment

AHeise Sep 13, 2024

Choose a reason for hiding this comment

dongwoo6kim Sep 13, 2024

Choose a reason for hiding this comment

AHeise Sep 16, 2024

Choose a reason for hiding this comment

dongwoo6kim Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

AHeise Sep 17, 2024

Choose a reason for hiding this comment

dongwoo6kim commented Sep 13, 2024 • edited Loading

AHeise left a comment

Choose a reason for hiding this comment

AHeise Sep 16, 2024

Choose a reason for hiding this comment

AHeise commented Sep 16, 2024

dongwoo6kim commented Sep 17, 2024

dongwoo6kim commented Sep 17, 2024

AHeise commented Sep 17, 2024

boring-cyborg bot commented Sep 17, 2024

dongwoo6kim commented Apr 29, 2024 •

edited

Loading

dongwoo6kim May 15, 2024 •

edited

Loading

dongwoo6kim Sep 17, 2024 •

edited

Loading

dongwoo6kim commented Sep 13, 2024 •

edited

Loading