Avoid raising FlakyReplay to users when multiple bugs are found #4228

carterbox · 2025-01-07T18:34:55Z

I'm trying to track down the cause of a flaky test. However, sometimes the test fails with FlakyFailure and other times with FlakyReplay. These are both errors, but no reproduce_failure hash is provided when the error is FlakyReplay.

What is the difference between a FlakyReplay and a FlakyFailure?

class Flaky(_Trimmable):
    """Base class for indeterministic failures. Usually one of the more
    specific subclasses (FlakyFailure or FlakyStrategyDefinition) is raised."""


class FlakyReplay(Flaky):
    """Internal error raised by the conjecture engine if flaky failures are
    detected during replay.

    Carries information allowing the runner to reconstruct the flakiness as
    a FlakyFailure exception group for final presentation.
    """

    def __init__(self, reason, interesting_origins=None):
        super().__init__(reason)
        self.reason = reason
        self._interesting_origins = interesting_origins

class FlakyFailure(ExceptionGroup, Flaky):
    """This function appears to fail non-deterministically: We have seen it
    fail when passed this example at least once, but a subsequent invocation
    did not fail, or caused a distinct error.

    Common causes for this problem are:
        1. The function depends on external state. e.g. it uses an external
           random number generator. Try to make a version that passes all the
           relevant state in from Hypothesis.
        2. The function is suffering from too much recursion and its failure
           depends sensitively on where it's been called from.
        3. The function is timing sensitive and can fail or pass depending on
           how long it takes. Try breaking it up into smaller functions which
           don't do that and testing those instead.
    """

    def __new__(cls, msg, group):
        # The Exception mixin forces this an ExceptionGroup (only accepting
        # Exceptions, not BaseException). Usually BaseException is raised
        # directly and will hence not be part of a FlakyFailure, but I'm not
        # sure this assumption holds everywhere. So wrap any BaseExceptions.
        group = list(group)
        for i, exc in enumerate(group):
            if not isinstance(exc, Exception):
                err = _WrappedBaseException()
                err.__cause__ = err.__context__ = exc
                group[i] = err
        return ExceptionGroup.__new__(cls, msg, group)

Above is the relevant source code from hypothesis. My understanding is that FlakyReplay is "internal" which means it should not be shown to the user, so something is wrong if I am seeing this error in my logs as the final error? In my case, this error is raised when the status goes from INTERESTING to INTERESTING. (Is there documentation for the meaning of this status? Couldn't find any mention in the source.)

Why is reproduce_failure provided for one and not the other?

Is this because FlakyReplay is not intended to be raise to the user?

jobh · 2025-01-07T19:36:18Z

Your understanding is correct @carterbox. The FlakyReplay was supposed to be transformed "on the way out", a la

hypothesis/hypothesis-python/src/hypothesis/core.py

Lines 1095 to 1098 in aef3eca

    
           except FlakyReplay as err: 
        
               # This was unexpected, meaning that the assume was flaky. 
        
               # Report it as such. 
        
               raise self._flaky_replay_to_failure(err, e) from None

— but it appears to escape. Flaky INTERESTING→INTERESTING would mean that two different errors are found, and indeed this test case reproduces it:

from hypothesis import given, strategies as st

a = 0

@given(st.integers())
def test_it(_):
    global a
    a += 1
    raise KeyError if a%2 == 0 else ValueError

So, it needs to be handled on the way out, for example by wrapping mark_interesting() in a try/catch/_flaky_replay_to_failure as above

hypothesis/hypothesis-python/src/hypothesis/core.py

Line 1142 in aef3eca

data.mark_interesting(interesting_origin)

I'm not sure if this is the best place though, and I'm afraid I don't have the bandwidth to follow this up at the moment...

Zac-HD · 2025-01-08T06:43:47Z

Maybe related to #4183?? and possibly #4110, cc @jakkdl in case you have ideas. (it's been a while since I dug into this)

jobh · 2025-01-08T08:11:00Z

@Zac-HD I think the explanation is simpler, just missing handling of FlakyReplay at mark_interesting-time, per earlier comment.

jakkdl · 2025-01-08T11:37:37Z

yeah this doesn't look related to exceptiongroups (and repro behaves the same pre-#4110). It indeed appears that FlakyReplays can currently only happen on assert failures

…#4228

tybug added bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable labels Jan 7, 2025

tybug changed the title ~~Provide reproduce_failure hash for FlakyReplay?~~ Avoid raising FlakyReplay to users when multiple bugs are found Jan 7, 2025

jobh added a commit to jobh/hypothesis that referenced this issue Jan 12, 2025

Add test cases (failing) for HypothesisWorks#4183 and HypothesisWorks…

3b3ec15

…#4228

jobh mentioned this issue Jan 12, 2025

Fix some flaky reporting corner cases #4239

Merged

jobh closed this as completed in #4239 Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid raising FlakyReplay to users when multiple bugs are found #4228

Avoid raising FlakyReplay to users when multiple bugs are found #4228

carterbox commented Jan 7, 2025

jobh commented Jan 7, 2025

Zac-HD commented Jan 8, 2025

jobh commented Jan 8, 2025

jakkdl commented Jan 8, 2025

Avoid raising FlakyReplay to users when multiple bugs are found #4228

Avoid raising FlakyReplay to users when multiple bugs are found #4228

Comments

carterbox commented Jan 7, 2025

What is the difference between a FlakyReplay and a FlakyFailure?

Why is reproduce_failure provided for one and not the other?

jobh commented Jan 7, 2025

Zac-HD commented Jan 8, 2025

jobh commented Jan 8, 2025

jakkdl commented Jan 8, 2025