Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Treat TimeoutError as workflow/update failure instead of task failure #800

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cretz
Copy link
Member

@cretz cretz commented Mar 26, 2025

What was changed

TimeoutError is now an workflow/update failure instead of task failure. This makes sense since task failures are only meant to be code problems that are fixable with a code redeployment. We do not consider it a backwards incompatible change to make something no longer be a task failure.

Checklist

  1. Closes [Feature Request] Consider TimeoutError as a workflow failure exception #798

@cretz cretz requested a review from a team as a code owner March 26, 2025 17:04
Copy link
Contributor

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding:

  • is this consistent with other SDKs?
  • it seems a bit severe to fail the workflow due to a timeout error but is the rationale, "what else can we do, there's nothing to fix"? I suppose this will give incentive for users to retry workflows but I was always unsure if workflow retries were something that we wanted to be common

@cretz
Copy link
Member Author

cretz commented Mar 26, 2025

is this consistent with other SDKs?

Yes. In all non-Python and non-Ruby SDKs, wait condition timeout is not an error it's a boolean. In Ruby and Python we use language-native timeout features and in Ruby timeouts are workflow failures, we just need to do it for Python. Today, Python is the only SDK where a timeout of a wait condition causes task failure, which is bad.

it seems a bit severe to fail the workflow due to a timeout error but is the rationale, "what else can we do, there's nothing to fix"? I suppose this will give incentive for users to retry workflows but I was always unsure if workflow retries were something that we wanted to be common

We don't consider it severe to fail the workflow for runtime errors, but really users should be catching this and reacting probably anyways. In cases where timeout is a failure, it is a runtime failure not a code/task failure.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Consider TimeoutError as a workflow failure exception
2 participants