Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix record child workflow complete mutable state stale check #2673

Merged
merged 3 commits into from
Mar 29, 2022

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Mar 29, 2022

What changed?

  • Only reload mutable state once if we detect possible stale mutable state when recording child workflow execution completed.
  • This PR is a temp fix for avoiding the task infinite retry issue for 1.16 release. We should use child init event ID + version for verifying if the child actually exists or not.

Why?

  • In xdc there's no guarantee that parent will have the child information and its next eventID will larger than the initiatedID in the RecordChildCompletedRequest after forced failover.
  • Existing logic returns ErrStaleState, which will cause the handling logic keep loading mutable state and return and "maximum attempt exceeded" error for the record child completed API. Then task processing logic for CloseWorkflowExecution task will retry infinitely.

How did you test it?

  • Existing tests. Eyeballing.

Potential risks

Is hotfix candidate?

@yycptt yycptt requested review from wxing1292 and yiminc March 29, 2022 21:44
@yycptt yycptt requested a review from a team as a code owner March 29, 2022 21:44
@yycptt yycptt merged commit a274580 into temporalio:master Mar 29, 2022
@yycptt yycptt deleted the fix-record-child-stale-check branch March 29, 2022 23:45
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants