Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Avoid raising FlakyReplay to users when multiple bugs are found #4228

Closed
carterbox opened this issue Jan 7, 2025 · 4 comments · Fixed by #4239
Closed

Avoid raising FlakyReplay to users when multiple bugs are found #4228

carterbox opened this issue Jan 7, 2025 · 4 comments · Fixed by #4239
Labels
bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable

Comments

@carterbox
Copy link

I'm trying to track down the cause of a flaky test. However, sometimes the test fails with FlakyFailure and other times with FlakyReplay. These are both errors, but no reproduce_failure hash is provided when the error is FlakyReplay.

What is the difference between a FlakyReplay and a FlakyFailure?

class Flaky(_Trimmable):
    """Base class for indeterministic failures. Usually one of the more
    specific subclasses (FlakyFailure or FlakyStrategyDefinition) is raised."""


class FlakyReplay(Flaky):
    """Internal error raised by the conjecture engine if flaky failures are
    detected during replay.

    Carries information allowing the runner to reconstruct the flakiness as
    a FlakyFailure exception group for final presentation.
    """

    def __init__(self, reason, interesting_origins=None):
        super().__init__(reason)
        self.reason = reason
        self._interesting_origins = interesting_origins

class FlakyFailure(ExceptionGroup, Flaky):
    """This function appears to fail non-deterministically: We have seen it
    fail when passed this example at least once, but a subsequent invocation
    did not fail, or caused a distinct error.

    Common causes for this problem are:
        1. The function depends on external state. e.g. it uses an external
           random number generator. Try to make a version that passes all the
           relevant state in from Hypothesis.
        2. The function is suffering from too much recursion and its failure
           depends sensitively on where it's been called from.
        3. The function is timing sensitive and can fail or pass depending on
           how long it takes. Try breaking it up into smaller functions which
           don't do that and testing those instead.
    """

    def __new__(cls, msg, group):
        # The Exception mixin forces this an ExceptionGroup (only accepting
        # Exceptions, not BaseException). Usually BaseException is raised
        # directly and will hence not be part of a FlakyFailure, but I'm not
        # sure this assumption holds everywhere. So wrap any BaseExceptions.
        group = list(group)
        for i, exc in enumerate(group):
            if not isinstance(exc, Exception):
                err = _WrappedBaseException()
                err.__cause__ = err.__context__ = exc
                group[i] = err
        return ExceptionGroup.__new__(cls, msg, group)

Above is the relevant source code from hypothesis. My understanding is that FlakyReplay is "internal" which means it should not be shown to the user, so something is wrong if I am seeing this error in my logs as the final error? In my case, this error is raised when the status goes from INTERESTING to INTERESTING. (Is there documentation for the meaning of this status? Couldn't find any mention in the source.)

Why is reproduce_failure provided for one and not the other?

Is this because FlakyReplay is not intended to be raise to the user?

@jobh
Copy link
Contributor

jobh commented Jan 7, 2025

Your understanding is correct @carterbox. The FlakyReplay was supposed to be transformed "on the way out", a la

except FlakyReplay as err:
# This was unexpected, meaning that the assume was flaky.
# Report it as such.
raise self._flaky_replay_to_failure(err, e) from None
— but it appears to escape. Flaky INTERESTING→INTERESTING would mean that two different errors are found, and indeed this test case reproduces it:

from hypothesis import given, strategies as st

a = 0

@given(st.integers())
def test_it(_):
    global a
    a += 1
    raise KeyError if a%2 == 0 else ValueError

So, it needs to be handled on the way out, for example by wrapping mark_interesting() in a try/catch/_flaky_replay_to_failure as above

data.mark_interesting(interesting_origin)

I'm not sure if this is the best place though, and I'm afraid I don't have the bandwidth to follow this up at the moment...

@tybug tybug added bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable labels Jan 7, 2025
@tybug tybug changed the title Provide reproduce_failure hash for FlakyReplay? Avoid raising FlakyReplay to users when multiple bugs are found Jan 7, 2025
@Zac-HD
Copy link
Member

Zac-HD commented Jan 8, 2025

Maybe related to #4183?? and possibly #4110, cc @jakkdl in case you have ideas. (it's been a while since I dug into this)

@jobh
Copy link
Contributor

jobh commented Jan 8, 2025

@Zac-HD I think the explanation is simpler, just missing handling of FlakyReplay at mark_interesting-time, per earlier comment.

@jakkdl
Copy link
Contributor

jakkdl commented Jan 8, 2025

yeah this doesn't look related to exceptiongroups (and repro behaves the same pre-#4110). It indeed appears that FlakyReplays can currently only happen on assert failures

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants