-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Revise AbstractConsumerSeekAware.onPartitionsAssigned()
for concurreny
#3373
Comments
@artembilan |
See the I have an impression that the same callback is used for different groups (or different container instances in case of concurrency) and those partitions are distributed between containers. However that |
Thanks! I'll have to look into the issue further, but for personal reasons I won't be able to dive into it until next week. I'll try to get to it as soon as I can. |
No worries! The first milestone is out today. The next one in a month, so we have plenty of time 😅 |
@artembilan Example description
|
So, is the idea for fix to filter out only actual partition to seek and ignore those which are not assigned to us at the moment? |
I thought about that, but it's not a perfect solution. I'm thinking about what we can do about this. If you have any suggestions, I'd love to hear them. |
Isn’t that partition assigned / revoked API about? Shouldn’t we adjust our maps respectively ? |
We should also look at the maps used by the partition assigned/revoked API, but I think we may think more about how we can change the |
This issue seems to occur frequently when I'm wondering if the high concurrency is causing latency somewhere, and the rebalancing is happening when it exceeds that latency. I'd love to know where, but it's not easy to find. Below is an error from a The
|
@artembilan IMHO, for some reason, the rebalancing listener's partition assignment method may be called at runtime just as (or just before) you want to reset the offset. (I'm not sure exactly why this happens, other than that high concurrency makes it more likely). It's not awkward to get an error, but the official documentation doesn't talk about this error possibility, so I think it would be nice to leave a cautionary note in the documentation. What do you think about this? |
Thank you, @bky373 , for investigation! So, according to your findings it looks like the problem is even reproducible with Spring for Apache Kafka, but really with plain Apache Kafka client and respective concurrency and seeking. So, how about instead of throwing that exception up to the listener container thread, just emit a Another mitigation could be done via According to the Another way is just to ignore WDYT? |
@artembilan Hmm... I guess I'm still not convinced. I'm still concerned about whether Considering that resetting the offset is usually to (re)process the message, if it fails, we should probably take immediate action, such as a re-request; in this respect, I think the I would greatly appreciate your response, as it would help me understand better! Thank you, as always. |
Well, when it is an ERROR that means something is wrong with an application. So, since this is not what has been caused by the application, therefore ERROR would be misleading. And that's why I suggested that Does that make sense? |
Thanks! That helped me understand. |
@artembilan Assume the following arbitrary conditions.
1. Consumers started
2. Leading consumers join the group and monopolize the partitions
3. Store partition offsets for seek
4. Rebalancing consumers
5. Errors during seeking
Now that I've found the cause of the error, I'll write a PR that includes two things as soon as I can.
And I didn't use the |
I think I'm fine with the plan. |
Looks like different partitions are registered for same
ConsumerSeekCallback
when we have a multi-@KafkaListener
scenario.See
AbstractConsumerSeekAwareTests
and its TODO.when we have different groups and
concurrency
, different listener container instances are going to deal with different partitions.But looks like we add all of them to the same map entry:
This way when we call
seek
later from the logic, may lead to error like:The text was updated successfully, but these errors were encountered: