Apache Pulsar currently has known gaps in acknowledgement (ack) handling, particularly in scenarios involving key ordered message processing requirements (Failover, Exclusive, or Key_Shared subscriptions) and during broker restarts or topic unloads triggered by Pulsar load balancing events. In these scenarios, acknowledgements can be lost, resulting in stuck consumers due to key order message delivery rules and additional message duplication, which affects the reliability and end-to-end latency of message processing. The intention is to have a solution that doesn't require enabling Pulsar transactions.
Pulsar's default mode is at-least-once messaging, so duplicates are acceptable, but lost acknowledgements cause unnecessary duplicate messages. In the case of key-ordered message processing with Key_Shared subscriptions, a lost acknowledgement will cause message delivery to stop for further messages with keys that the lost acknowledgement's message has.
These situations currently cause unnecessary disruptions to Key_Shared processing applications, where manual intervention or automated monitoring solutions are needed to detect stuck consumers and recover the situation by restarting individual consumers.
One of the primary motivations for adding automatic retries for failed acknowledgements is to enhance the reliability of key order message processing using Key_Shared subscriptions. However, it is possible to improve the current situation without implementing the proposed solution. A deeper analysis should be conducted to reproduce the current issues and identify the root cause of the problem. If the root cause necessitates implementing automatic retries for failed acknowledgements, this proposal's priority should be increased. Otherwise, alternative solutions should be prioritized before considering the automatic retry solution for failed acknowledgements.
This proposal aims to address these issues by enhancing the existing "ack receipt" feature with an automatic retry mechanism for failed acknowledgements. Users do not need to configure the "ack receipt" feature explicitly when autoRetryAcknowledgement
is enabled. The solution is built upon the existing "ack receipt" feature at the binary protocol level. The gaps in the current "ack receipt" feature, such as Bug: When ack receipts are enabled, no response is sent to the client if the topic has been unloaded or is being transferred #23261, need to be addressed to achieve the desired outcome.
The following new methods will be added to the ConsumerBuilder
interface:
/**
* Enable or disable automatic retry for failed acknowledgements.
*
* @param autoRetryAcknowledgement whether to automatically retry failed acknowledgements
* @return the consumer builder instance
*/
ConsumerBuilder<T> autoRetryAcknowledgement(boolean autoRetryAcknowledgement);
/**
* Overrides the default maximum number of retry attempts for a failed acknowledgement
* when autoRetryAcknowledgement is enabled.
*
* @param maxAckRetries the maximum number of retry attempts
* @return the consumer builder instance
*/
ConsumerBuilder<T> maxAcknowledgementRetries(int maxAckRetries);
/**
* Overrides the default the retry delay backoff for acknowledgement retries.
* This is used when autoRetryAcknowledgement is enabled.
*
* @param ackRetryBackoff the backoff strategy to use for retries
* @return the consumer builder instance
*/
ConsumerBuilder<T> autoRetryAcknowledgementBackoff(RedeliveryBackoff ackRetryBackoff);
This example applies to the Pulsar Java client. Other clients can implement similar changes for adding the autoRetryAcknowledgement
mode.
-
Implement a new
autoRetryAcknowledgement
mode for Pulsar clients where acknowledgements that fail (due to broker restarts, topic unloads, Pulsar load balancing, or other issues) are automatically retried by the client. -
Modify the
ServerCnx
class to send failure responses for discarded acknowledgements when ack receipts are enabled to fix issue #23261. -
Implement a new component in the client library to manage automatic retries of failed acknowledgements.
-
When
autoRetryAcknowledgement
is enabled, the "ack receipt" feature is used under the covers. One of the differences is that the.acknowledge
method should remain asynchronous, and the retries should happen in the background. The existing "ack receipt" feature makes.acknowledge
synchronous, which is not the desired behavior for many applications since it will cause performance issues by adding a server round-trip when "ack receipt" is synchronous. -
When both
autoRetryAcknowledgement
and "ack receipt" are enabled, the existing "ack receipt" behavior of synchronous acks will be used. The.acknowledge
method will only return after the ack retry has succeeded or failed after all retry attempts. Similarly, the.acknowledgeAsync
method will return after theautoRetryAcknowledgement
completes. -
Update the
ConsumerBuilder
interface to include options for configuring automatic ack retries. This applies to the Java client. Other clients could implement similar changes. -
Implement additional client-side metrics to track failed acknowledgements, retry attempts, and success rates.
-
Update relevant documentation to reflect the new feature and its proper usage.
This feature will be opt-in. It doesn't introduce backwards compatibility issues with existing implementations. Clients not utilizing the new automatic retry option will continue to function as before. No deprecation or migration is required for existing users.
Comprehensive testing will include:
- Unit tests for the new retry mechanism.
- Integration tests simulating various failure scenarios (broker restarts, topic unloads, network issues).
- Performance tests to ensure the retry mechanism does not introduce significant overhead.
- Mailing List discussion thread: https://lists.apache.org/thread/7sg7hfv9dyxto36dr8kotghtksy1j0kr
- Mailing List voting thread: