Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Flusher task is stopping due to InvalidStateError #624

Open
debbyglance opened this issue Oct 24, 2024 · 1 comment
Open

Flusher task is stopping due to InvalidStateError #624

debbyglance opened this issue Oct 24, 2024 · 1 comment
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@debbyglance
Copy link
Contributor

Observed behavior

See #606
The flusher() task is stopping because InvalidStateError is raised when attempting to call .set_result() on the future

Expected behavior

The flusher task should never stop

Server and client version

client: nats.py 2.6.0
server: 2.10.7

Host environment

NATS server is running on docker image nats:2.10.7-alpine3.18
The client is a python quart application running behing hypercorn.

Steps to reproduce

I don't have steps to reproduce, but this happens sporadically in production, and I think could be easy to fix. This is what I think is happening.

  1. During a reconnect attempt, _attempt_reconnect() calls _flush_pending()
  2. The _flush_pending() task creates a Future and adds it to the _flush_queue
  3. _flush_pending waits on the future
  4. The reconnect fails and _attempt_reconnect() is cancelled.
  5. This cancels the _flush_pending task which cancels the Future that was created in step 2 (python cancels a future that is being awaited by a task when the task is cancelled).
  6. Now there is a Future in the _flush_queue that is in cancelled state
  7. The new reconnect attempt starts a new _flusher() task
  8. The _flusher() task pulls the cancelled Future out of the queue
  9. When the _flusher() task calls set_result() on the future, it results in an InvalidStateError exception because the future is already done
  10. The flusher task aborts

Possible fixes would be:

  • _flusher() could ignore any cancelled futures in the _flush_queue
  • clear the _flush_queue on reconnect (or create a new queue)
@debbyglance debbyglance added the defect Suspected defect such as a bug or regression label Oct 24, 2024
debbyglance added a commit to debbyglance/nats.py that referenced this issue Oct 28, 2024
A cancelled future in the flush_queue can cause the flusher task to fail with InvalidStateError, attempting to set the result on a "done" future.
debbyglance added a commit to debbyglance/nats.py that referenced this issue Nov 19, 2024
Fixes issue nats-io#624

A cancelled future in the flush_queue can cause the flusher task to fail with InvalidStateError, due to attempting to set the result on a "done" future.

Signed-off-by: Debby Mendez <debby@glance.net>
@shoriminimoe
Copy link

Hi NATS team! 👋

Is there a plan to merge #636? My project is affected by this issue. 😦

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

3 participants