Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Test Server: Fix Nexus operation timeout during retry #2221

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

pdoerner
Copy link
Contributor

What was changed

Changed test server logic to not immediately record a failure if the next attempt schedule time would exceed an operation's schedule to close timeout.

Why?

Closes #2215

@pdoerner pdoerner requested a review from a team as a code owner September 16, 2024 16:48
@@ -789,7 +789,12 @@ private static void timeoutNexusOperation(
private static State failNexusOperation(
RequestContext ctx, NexusOperationData data, Failure failure, long notUsed) {
RetryState retryState = attemptNexusOperationRetry(ctx, Optional.of(failure), data);
if (retryState == RetryState.RETRY_STATE_IN_PROGRESS) {
if (retryState == RetryState.RETRY_STATE_IN_PROGRESS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm but in the test why are we even reaching this point since the error thrown should never fail the operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is also for handling task failures. I guess maybe @bergundy was right and I should separate out a FAIL_ATTEMPT from failing the operation as a whole. But this is how workflow and activity task failures/retries are handled and I though it would be better to stay consistent and refactor them all at the same time.

@pdoerner pdoerner enabled auto-merge (squash) September 17, 2024 16:09
@pdoerner pdoerner disabled auto-merge September 17, 2024 16:09
@pdoerner pdoerner enabled auto-merge (squash) September 17, 2024 16:10
@pdoerner pdoerner merged commit 6f0cf07 into master Sep 17, 2024
11 checks passed
@pdoerner pdoerner deleted the fix-test-server-op-timeout branch September 17, 2024 16:20
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test Server does not treat internal errors as retryable from Nexus operations
2 participants