You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A consumer gets stuck and eventually unstuck after a random time, ranging from a few seconds to an hour. Even before the consumer gets stuck the statistics reported by the nats consumer info are seemingly incorrect. Outstanding Acks are too high. Redelivery count jumps wildly and sometimes goes backwards.
The effect was observed independently in separate environment. It is a bit random though and may take a few minutes to appears:
Java, Windows 11, local 3 node cluster
Golang, CLoud Linux, 3 nod cluster
Expected behavior
Consistent consumer behavior even when messages marked for redelivery are deleted
Server and client version
Nats server 2.10.24
Latest Java client
Latest Go client
Host environment
Windows 11 as well as Cloud Linux
Steps to reproduce
Conceptually:
A high percentage of message not acked and being redelivered
A high percentage of message deleted, while pending redelivery
3 node cluster
Exact steps:
Start a 3 node cluster
Start the Feeder (recreates a replica 3 stream)
Start the Consumer
Wait for "consumer stalled"
It may be required to stop and restart the consumer.
Observe the consumer info - Redelivery count may go down. Outstanding ack may be growing to unrealistic values.
We are seeing this behaviour as well on a linux based 3 node cluster installation. I was also able to reproduce it on a docker based local cluster (macOS).
After a few hundred to a few thousand messages our consumers have unprocessed messages that do not get processed even though there is 1 pull request waiting per consumer:
Also the Ack Pending doesn't seem to make sense. The numbers keep increasing although we have limited redeliveries to 4 (with 35s delay).
The consumers seem to wake up randomly and deliver the unprocessed messages, but that happens with increasing delays the longer the consumers run. We have observed delays of several hours.
The consumers are durable in our case. Restarting the subscription on a consumer often does nothing. Unprocessed messages stay unprocessed. Removing the consumers and recreating them with the same config (DeliverAll) on the existing stream helps for a while. As mentioned above: after a few hundred/thousand new messages things start to pile up again.
Observed behavior
A consumer gets stuck and eventually unstuck after a random time, ranging from a few seconds to an hour. Even before the consumer gets stuck the statistics reported by the
nats consumer info
are seemingly incorrect. Outstanding Acks are too high. Redelivery count jumps wildly and sometimes goes backwards.The effect was observed independently in separate environment. It is a bit random though and may take a few minutes to appears:
Expected behavior
Consistent consumer behavior even when messages marked for redelivery are deleted
Server and client version
Nats server 2.10.24
Latest Java client
Latest Go client
Host environment
Windows 11 as well as Cloud Linux
Steps to reproduce
Conceptually:
Exact steps:
Reproducer20250114_noack_delete.zip
The text was updated successfully, but these errors were encountered: