Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug] ServiceError: Failed to signalWithStart Workflow #4764

Closed
neelance opened this issue Aug 10, 2023 · 4 comments
Closed

[Bug] ServiceError: Failed to signalWithStart Workflow #4764

neelance opened this issue Aug 10, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@neelance
Copy link

We started to see a number of ServiceError: Failed to signalWithStart Workflow errors recently. The error message Error: 9 FAILED_PRECONDITION: workflow operation rejected because workflow is closing seems to be related.

It started around August 2nd. We are using Temporal Cloud. I do not immediately see any change on our end that could cause the issues. Even if there was some change, I would expect that signalWithStart should not error like this.

Any ideas?

@neelance neelance added the bug Something isn't working label Aug 10, 2023
@bergundy
Copy link
Member

This is due to a change made in server 1.21, where if a workflow execution is trying to complete, the server rejects signals for that execution.

This error should not use a non retryable FAILED_PRECONDITION code.
There are discussion to change to a retryable RESOURCE_EXHAUSTED in the near future and later on to carry over these signals to the next execution on continue-as-new.

I'm transferring this issue to the server repo as it is a server issue.

@bergundy bergundy transferred this issue from temporalio/sdk-typescript Aug 10, 2023
@bergundy
Copy link
Member

Short term fix here: #4765
It will be included in the upcoming 1.21.5 release and deployed to Cloud in the near future.

@yiminc
Copy link
Member

yiminc commented Aug 11, 2023

This error happen when signals coming to a workflow is faster than the workflow could handle and the workflow already expressed intention to close itself. Keep accepting new signals will prevent the workflow from closing and could lead to more serious system stability issue. The mitigation is to return a retryable error (ResourceExhaustedError with cause set to WorkflowBusy) so SDK could backoff a little bit given the workflow a chance to close itself. Hopefully by the time of retry, the workflow already closed and SignalWithStart would start a new run.
Before that mitigation is deployed to server, a slow retry on client side could also mitigate this issue.

@yiminc
Copy link
Member

yiminc commented Oct 5, 2023

The short term fix is included in 1.21.5 and 1.22 as well.

@yiminc yiminc closed this as completed Oct 5, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants