Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug] Wait_condition timeout makes tests impossible #792

Closed
Vaelatern opened this issue Mar 24, 2025 · 12 comments
Closed

[Bug] Wait_condition timeout makes tests impossible #792

Vaelatern opened this issue Mar 24, 2025 · 12 comments
Labels
bug Something isn't working

Comments

@Vaelatern
Copy link

Vaelatern commented Mar 24, 2025

What are you really trying to do?

Have a workflow that times out when it has not received a signal in X hours. [works]

Test that workflow. [doesn't work]

Describe the bug

$ uv run pytest
  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================= test session starts ==============================================
platform linux -- Python 3.13.2, pytest-8.3.5, pluggy-1.5.0
rootdir: $(pwd)
configfile: pyproject.toml
plugins: asyncio-0.25.3
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 2 items

test_main.py .                                                                                           [ 50%]
test_main_broken.py ^C

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
/usr/lib/python3.13/selectors.py:452: KeyboardInterrupt
(to show a full traceback on KeyboardInterrupt use --full-trace)
======================================== 1 passed in 437.27s (0:07:17) =========================================

hangs forever until killed. See above where test_main_broken was killed after 7 minutes despite containing a sleep for timedelta(minutes=1) (and that shouldn't matter because: timeskip.)

Minimal Reproduction

Branch here ready to clone with minimal code

Environment/Versions

  • OS and processor: Linux, x86_64
  • Temporal Version: Testing
  • Straight python dependency as per the reproduction

Additional context

Is there another way, using asyncio.sleep() instead, to have the same result?

@Vaelatern Vaelatern added the bug Something isn't working label Mar 24, 2025
@cretz
Copy link
Member

cretz commented Mar 25, 2025

Thanks for the replication! We are investigating...

@cretz
Copy link
Member

cretz commented Mar 25, 2025

Ok, preliminary research has shown the issue here is that the wait_condition call is raising TimeoutError as expected. By default this exception does not fail the workflow but rather the workflow task (see https://github.com/temporalio/sdk-python?tab=readme-ov-file#exceptions) which effectively "suspends" the workflow until it is fixed. This type of workflow task failure is usually meant for code failures and not runtime ones like timeout. We chose to use TimeoutError here because this is just a wrapper around asyncio.wait_for which raises that error.

I will confer with the team to see if we want to consider allowing all cases of TimeoutError to actually be workflow failures instead of workflow task failures. In the meantime, you can change @workflow.defn to @workflow.defn(failure_exception_types=[TimeoutError]) on the workflow itself, or pass in workflow_failure_exception_types=[TimeoutError] to the worker when creating it.

@cretz
Copy link
Member

cretz commented Mar 25, 2025

I have opened #798

@Vaelatern
Copy link
Author

Thank you!

When I catch the exception in my minimum reproduction it works... unless I add the following discrete test:

import uuid
import pytest
from temporalio import workflow
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
from dataclasses import dataclass
from datetime import timedelta

@dataclass
class SayHelloArgs:
    name: str = ""

@workflow.defn
class SayHello:
    @workflow.run
    async def run(self, args: SayHelloArgs) -> str:
        self.signal_received = False
        try:
            await workflow.wait_condition(lambda: self.signal_received, timeout=timedelta(seconds=1))
        except asyncio.TimeoutError as e:
            pass
        return "Hello"


@pytest.mark.asyncio
async def test_empty_event_workflow():
    task_queue_name = str(uuid.uuid4())
    async with await WorkflowEnvironment.start_time_skipping() as env:

        async with Worker(
                env.client,
                task_queue=task_queue_name,
                workflows=[SayHello],
                ):
            handle = await env.client.start_workflow(
                    SayHello,
                    SayHelloArgs(),
                    id=str(uuid.uuid4()),
                    task_queue=task_queue_name,
                    )

            assert "Hello" == await handle.result()

@Vaelatern
Copy link
Author

Maybe what I'd really like is better introspection so I'd have a chance to look into why these tests are hanging myself

@cretz
Copy link
Member

cretz commented Mar 25, 2025

unless I add the following discrete test

Not following what is meant here, are you saying this test does not pass?

Maybe what I'd really like is better introspection so I'd have a chance to look into why these tests are hanging myself

Yes by default workflow task failures cause workflow suspension and often you need the UI or need to ask for the history to see that a workflow is suspended pending task failure. Setting the worker init option workflow_failure_exception_types=[Exception] will make any exception fail the workflow instead of the task which will raise an exception when trying to get workflow result.

@Vaelatern
Copy link
Author

Yes, the test I shared locks up as well.

@Vaelatern
Copy link
Author

I have updated the minimum reproduction branch with 2 commits. One fixes the first workflow, the second adds this other broken test.

@cretz
Copy link
Member

cretz commented Mar 25, 2025

This does not happen for me. Taking your exact sample from https://github.com/Vaelatern/-minimum-reproduction/tree/temporal-py-792 and applying

diff --git a/main.py b/main.py
index 3e1619d..b4735c4 100644
--- a/main.py
+++ b/main.py
@@ -14,7 +14,10 @@ class WaitAMinuteWorkflow:
     @workflow.run
     async def run(self) -> str:
         self.is_poked = False
-        await workflow.wait_condition(lambda: self.is_poked, timeout=timedelta(minutes=1))
+        try:
+            await workflow.wait_condition(lambda: self.is_poked, timeout=timedelta(minutes=1))
+        except asyncio.TimeoutError as e:
+            pass
         return "all done"
 
     @workflow.signal

uv run pytest passes as expected

EDIT: I see your update, will test your latest commits

@cretz
Copy link
Member

cretz commented Mar 25, 2025

The error you are getting now is:

  File "path/to/minimum-reproduction/test_main_broken2.py", line 20, in run
    except asyncio.TimeoutError as e:
           ^^^^^^^
NameError: name 'asyncio' is not defined. Did you forget to import 'asyncio'?

I'd recommend passing -k --log-cli-level=WARNING so you can see log output, e.g. uv run pytest -s --log-cli-level=WARNING -k test_empty_event_workflow. This is the case where it is a legitimate task failure suspended waiting for a code fix, but yes task failures do not fail the workflow by default. As alluded to above, if you set workflow_failure_exception_types=[Exception] worker option on that test, the workflow will fail for all exceptions (including NameError).

@Vaelatern
Copy link
Author

Vaelatern commented Mar 26, 2025

That's the core issue! Finally worked out what was wrong in my original test (invoked activities that were not registered) by chasing down reasons like with this thread.

Thank you!

@cretz
Copy link
Member

cretz commented Mar 26, 2025

No problem! We have opened #798 to allow timeouts to cause workflow failures properly, but will otherwise close this issue (feel free to keep commenting or join us in #python-sdk on Slack or the forums).

@cretz cretz closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants