Skip to content

invariant tests can panic or return incorrect information due to state change during test execution #9764

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
2 tasks done
nbaztec opened this issue Jan 27, 2025 · 11 comments · Fixed by #9771
Closed
2 tasks done
Labels
A-testing Area: testing Cmd-forge-test Command: forge test T-bug Type: bug

Comments

@nbaztec
Copy link
Contributor

nbaztec commented Jan 27, 2025

Component

Forge

Have you ensured that all of these are up to date?

  • Foundry
  • Foundryup

What version of Foundry are you on?

forge Version: 0.3.1-dev Commit SHA: 28e6ff1

What version of Foundryup are you on?

No response

What command(s) is the bug in?

forge test

Operating System

Linux

Describe the bug

While generating an invariant strategy in fuzz_param_from_state, the values are generated from the dictionary state:

pub fn fuzz_param_from_state(
    param: &DynSolType,
    state: &EvmFuzzState,
) -> BoxedStrategy<DynSolValue> {
    // Value strategy that uses the state.
    let value = || {
        let state = state.clone();
        let param = param.clone();
        // Generate a bias and use it to pick samples or non-persistent values (50 / 50).
        // Use `Index` instead of `Selector` when selecting a value to avoid iterating over the
        // entire dictionary.
        any::<(bool, prop::sample::Index)>().prop_map(move |(bias, index)| {
            let state = state.dictionary_read();
            let values = if bias { state.samples(&param) } else { None }
                .unwrap_or_else(|| state.values())
                .as_slice();
            values[index.index(values.len())]
        })
    };

In case of addresses, we are filter mapping them (correctly) if they are pre-deployed library addresses and we filter them with None:

    // Convert the value based on the parameter type
    match *param {
        DynSolType::Address => {
            let deployed_libs = state.deployed_libs.clone();
            value()
                .prop_filter_map("filter address fuzzed from state", move |value| {
                    let fuzzed_addr = Address::from_word(value);
                    // Do not use addresses of deployed libraries as fuzz input.
                    // See <https://github.com/foundry-rs/foundry/issues/8639>.
                    if !deployed_libs.contains(&fuzzed_addr) {
                        Some(DynSolValue::Address(fuzzed_addr))
                    } else {
                        None
                    }
                })
                .boxed()
        }

During invariant test execution, we update the dictionary state via collect_data - this has a side-effect of invalidating the rng state for the proptest case generator.

The case tree is generated when the run is invoked. The proptest library expects from here on that case.current() be deterministic - which is computed from the dictionary_state established above. However for tests like invariant_fork_handler_block where the tests are meant to fail, the proptest library tries to construct the TestError while using case.current().

This becomes problematic as the state has now been updated, so case.current() will always return a different input than what was used for the test. In an extreme case if the address just happens to be a pre-deployed library, the computation will yield a None and subsequently panic. Note that the values in a filter_map are only rejected when a new tree is constructed, but this call is actually the result of proptest trying to retrieve case.current() - which will return None and subsequently fail here

Solutions

  • If possible we should avoid changing the rng state during test execution.
  • Patch proptest to no longer rely on case.current() after test execution assuming it can be non-deterministic.

Happy to file a PR once we decide on a way forward.

@nbaztec nbaztec added T-bug Type: bug T-needs-triage Type: this issue needs to be labelled labels Jan 27, 2025
@github-project-automation github-project-automation bot moved this to Todo in Foundry Jan 27, 2025
@grandizzy
Copy link
Collaborator

grandizzy commented Jan 27, 2025

@nbaztec thanks for your ticket!

Note that the values in a filter_map are only rejected when a new tree is constructed

the next call in invariant tests is always generated by constructing a new tree, is something that I'm missing here?

// Generates the next call from the run using the recently updated
// dictionary.
current_run.inputs.push(
invariant_strategy
.new_tree(&mut invariant_test.execution_data.borrow_mut().branch_runner)
.map_err(|_| TestCaseError::Fail("Could not generate case".into()))?
.current(),
);

and using same runner

// Proptest runner to query for random values.
// The strategy only comes with the first `input`. We fill the rest of the `inputs`
// until the desired `depth` so we can use the evolving fuzz dictionary
// during the run.
pub branch_runner: TestRunner,

Can you make a test with the failing case you see to better understand and debug the issue? thanks!

@nbaztec
Copy link
Contributor Author

nbaztec commented Jan 27, 2025

the next call in invariant tests is always generated by constructing a new tree, is something that I'm missing here?

That is correct - this call to current() will not fail, but because proptest has internal calls to case.current() - they will either return an invalid (and possibly new) case, or generate a None that can cause a failure.

Can you make a test with the failing case you see to better understand and debug the issue? thanks!

Sure, it's a bit hard with the rng but I can create a case which will panic that can assist in the debug. I had to debug this case extensively ,so feel free to ask any further follow up questions.

@nbaztec
Copy link
Contributor Author

nbaztec commented Jan 27, 2025

I can get the test to panic with the following patch.

The idea is:

  1. Apply the patch.
  2. Execute the test RUST_LOG=warn PROPTEST_VERBOSE=3 cargo nextest run test_invariant_test_panics_on_deployed_lib --retries=0 --success-output=immediate --failure-output=immediate
  3. Assert incorrect behavior via:
    3a. By matching the last input and the output returned from the proptest runner. It will not match.
    3b. Uncomment the code and substitute the last generated address, to return a deployed lib address. Test will panic.

On my machine the last address was 0x0000000000000000000000000000000000002576.

Basically what happens is that proptest runs the test here and then the test fails as it's supposed to. Proptest then
tries to return the error here and in process tries to retrieve the test case, but as the state has been changed it will pull up a different value after executing the strategy, which will almost always be not the case that was executed, and if it ends up being None, it triggers a panic.

diff --git a/crates/evm/evm/src/executors/invariant/mod.rs b/crates/evm/evm/src/executors/invariant/mod.rs
index b3ea7a98..5a8b9a29 100644
--- a/crates/evm/evm/src/executors/invariant/mod.rs
+++ b/crates/evm/evm/src/executors/invariant/mod.rs
@@ -27,7 +27,7 @@
 use parking_lot::RwLock;
 use proptest::{
     strategy::{Strategy, ValueTree},
-    test_runner::{TestCaseError, TestRunner},
+    test_runner::{TestCaseError, TestError, TestRunner},
 };
 use result::{assert_after_invariant, assert_invariants, can_continue};
 use revm::primitives::HashMap;
@@ -35,7 +35,7 @@
 use std::{
     cell::RefCell,
     collections::{btree_map::Entry, HashMap as Map},
-    sync::Arc,
+    sync::{Arc, Mutex},
 };
 
 mod error;
@@ -337,7 +337,8 @@ pub fn invariant_fuzz(
         // Start timer for this invariant test.
         let timer = FuzzTestTimer::new(self.config.timeout);
 
-        let _ = self.runner.run(&invariant_strategy, |first_input| {
+        let last_input: Arc<Mutex<Option<BasicTxDetails>>> = Arc::new(Mutex::new(None));
+        let invariant_run_result = self.runner.run(&invariant_strategy, |first_input| {
             // Create current invariant run data.
             let mut current_run = InvariantTestRun::new(
                 first_input,
@@ -364,6 +365,8 @@ pub fn invariant_fuzz(
                 let tx = current_run.inputs.last().ok_or_else(|| {
                     TestCaseError::fail("No input generated to call fuzzed target.")
                 })?;
+                println!("current run input {tx:?}");
+                *last_input.as_ref().lock().unwrap() = Some(tx.clone());
 
                 // Execute call from the randomly generated sequence without committing state.
                 // State is committed only if call is not a magic assume.
@@ -485,6 +488,19 @@ pub fn invariant_fuzz(
             Ok(())
         });
 
+        println!("test result {invariant_run_result:?}");
+        if let TestError::Fail(_, tx) = invariant_run_result.unwrap_err() {
+            let input_case = (&*last_input.as_ref().lock().unwrap()).clone().unwrap();
+            let output_case = tx;
+            
+            assert_eq!(output_case, input_case);
+            
+        } else {
+            panic!("test must fail for demonstration!");
+        }
+
+        
+
         trace!(?fuzz_fixtures);
         invariant_test.fuzz_state.log_stats();
 
diff --git a/crates/evm/fuzz/src/invariant/mod.rs b/crates/evm/fuzz/src/invariant/mod.rs
index 49f27e9f..8a9adb89 100644
--- a/crates/evm/fuzz/src/invariant/mod.rs
+++ b/crates/evm/fuzz/src/invariant/mod.rs
@@ -211,7 +211,7 @@ pub fn add_selectors(
 }
 
 /// Details of a transaction generated by invariant strategy for fuzzing a target.
-#[derive(Clone, Debug)]
+#[derive(Clone, Debug, PartialEq)]
 pub struct BasicTxDetails {
     // Transaction sender address.
     pub sender: Address,
@@ -220,7 +220,7 @@ pub struct BasicTxDetails {
 }
 
 /// Call details of a transaction generated to fuzz invariant target.
-#[derive(Clone, Debug)]
+#[derive(Clone, Debug, PartialEq)]
 pub struct CallDetails {
     // Address of target contract.
     pub target: Address,
diff --git a/crates/evm/fuzz/src/strategies/param.rs b/crates/evm/fuzz/src/strategies/param.rs
index 643c70bd..f27e9f54 100644
--- a/crates/evm/fuzz/src/strategies/param.rs
+++ b/crates/evm/fuzz/src/strategies/param.rs
@@ -133,12 +133,22 @@ pub fn fuzz_param_from_state(
             let deployed_libs = state.deployed_libs.clone();
             value()
                 .prop_filter_map("filter address fuzzed from state", move |value| {
-                    let fuzzed_addr = Address::from_word(value);
+                    let mut fuzzed_addr = Address::from_word(value);
+                    
+                    // NOTE: Uncomment and replace the erroneously generated address with a deployed lib to make it panic.
+                    // if fuzzed_addr == alloy_primitives::address!("0000000000000000000000000000000068ba353c") {
+                    //     let new_fuzzed_addr = deployed_libs.iter().next().unwrap().clone();
+                    //     println!("substituting {fuzzed_addr:?} for deployed lib addr {new_fuzzed_addr:?}");
+                    //     fuzzed_addr = new_fuzzed_addr
+                    // }
+
                     // Do not use addresses of deployed libraries as fuzz input.
                     // See <https://github.com/foundry-rs/foundry/issues/8639>.
                     if !deployed_libs.contains(&fuzzed_addr) {
+                        println!("generate {fuzzed_addr:?} as <Some>");
                         Some(DynSolValue::Address(fuzzed_addr))
                     } else {
+                        println!("generate {fuzzed_addr:?} as <None>");
                         None
                     }
                 })
diff --git a/crates/forge/tests/it/invariant.rs b/crates/forge/tests/it/invariant.rs
index 6cd84829..ecec00ed 100644
--- a/crates/forge/tests/it/invariant.rs
+++ b/crates/forge/tests/it/invariant.rs
@@ -641,6 +641,15 @@ async fn test_invariant_roll_fork_handler() {
     );
 }
 
+#[tokio::test(flavor = "multi_thread")]
+async fn test_invariant_test_panics_on_deployed_lib() {
+    let filter = Filter::new("invariant_fork_handler_block", ".*", ".*fuzz/invariant/common/InvariantRollFork.t.sol");
+    let mut runner = TEST_DATA_DEFAULT.runner_with(|config| {
+        config.fuzz.seed = Some(U256::from(119u32));
+    });
+    let _results = runner.test_collect(&filter);
+}
+
 #[tokio::test(flavor = "multi_thread")]
 async fn test_invariant_excluded_senders() {
     let filter = Filter::new(".*", ".*", ".*fuzz/invariant/common/InvariantExcludedSenders.t.sol");

@grandizzy grandizzy added T-to-discuss Type: requires discussion and removed T-needs-triage Type: this issue needs to be labelled labels Jan 28, 2025
@grandizzy
Copy link
Collaborator

@nbaztec thank you, we actually just hit the issue during CI run here https://github.com/foundry-rs/foundry/actions/runs/12998235411/job/36250989880?pr=9766#step:12:343

Re solutions

If possible we should avoid changing the rng state during test execution.

I don't think we want this as we want the state to evolve and be able to fuzz from during invariant tests (like contracts can be created during runs and we should be able to use newly created addresses)

Patch proptest to no longer rely on case.current() after test execution assuming it can be non-deterministic.

Historically getting PRs merged in proptest has been a pain (slow review / merge like this PR proptest-rs/proptest#521 pending for quite some time), an alternative would be to fork it and patch it but we tried to avoid such.

Since this issue was introduced with fix for #8639 (that is PR #9527) what if instead prop_filter_map when fuzzing addresses from state here

DynSolType::Address => {
let deployed_libs = state.deployed_libs.clone();
value()
.prop_filter_map("filter address fuzzed from state", move |value| {
let fuzzed_addr = Address::from_word(value);
// Do not use addresses of deployed libraries as fuzz input.
// See <https://github.com/foundry-rs/foundry/issues/8639>.
if !deployed_libs.contains(&fuzzed_addr) {
Some(DynSolValue::Address(fuzzed_addr))
} else {
None
}
})
.boxed()
}

we would just fallback into regular address fuzzing (fuzz_param_inner) in case the address fuzzed from state is one of the deployed libraries?

@nbaztec
Copy link
Contributor Author

nbaztec commented Jan 28, 2025

If we wish to enforce the dynamic nature of state, and proptest won't adapt the code to assume the non-deterministic nature of a test run, guess we have no other choice than to acknowledge (with docs) that we opt-in to this behaviour and that the shrinking would probably not work as well if it relies on shrinking the failing case. Then it's only a matter of avoiding Nones (filter_map) within the strategy generation to avoid the panics.

I can submit a PR if we agree on it.

@grandizzy
Copy link
Collaborator

If we wish to enforce the dynamic nature of state, and proptest won't adapt the code to assume the non-deterministic nature of a test run, guess we have no other choice than to acknowledge (with docs) that we opt-in to this behaviour and that the shrinking would probably not work as well if it relies on shrinking the failing case. Then it's only a matter of avoiding Nones (filter_map) within the strategy generation to avoid the panics.

I can submit a PR if we agree on it.

We already avoid shrinking invariant failures through proptest and use our own process (because strategy only comes with the first call input and we fill the rest) so should not be an issue here, see

/// Shrinks the failure case to its smallest sequence of calls.
///
/// Maximal shrinkage is guaranteed if the shrink_run_limit is not set to a value lower than the
/// length of failed call sequence.
///
/// The shrunk call sequence always respect the order failure is reproduced as it is tested
/// top-down.
pub(crate) fn shrink_sequence(

@grandizzy grandizzy added A-testing Area: testing Cmd-forge-test Command: forge test and removed T-to-discuss Type: requires discussion labels Jan 28, 2025
@nbaztec
Copy link
Contributor Author

nbaztec commented Jan 28, 2025

Opened #9771

@grandizzy
Copy link
Collaborator

grandizzy commented Jan 29, 2025

Opened #9771

@nbaztec unfortunately this approach breaks tests and likely could introduce regressions / undesired behavior (noticed mismatches in fuzz persisted counterexamples which uses proptest support too), therefore I think we should go with patching proptest as you suggested. Since this could be long process and since we already consider to replace proptest post v1.0 I suggest to

thoughts? thank you!

@nbaztec
Copy link
Contributor Author

nbaztec commented Jan 29, 2025

Yes, I tried to fix some and unfortunately returning a strategy is not the way to go. I wanted to try a simpler way, which would be to deterministically increment the address until a valid non lib one is found in prop_map instead.
Though I personally prefer fixing it in proptest itself, the above could be an intermediate fix if the tests pass.

Thoughts?

@grandizzy
Copy link
Collaborator

deterministically increment the address until a valid non lib one is found in prop_map instead

Yeah, that should work too

@nbaztec
Copy link
Contributor Author

nbaztec commented Jan 29, 2025

I went with a better approach to randomize the defaulting address with a deterministic rng until we get a non lib address
5f239d8 in #9771

I believe this shouldn't affect the fuzz performance by much and shouldn't cause any change of functionality. (If it does then the tests are probably not written with proper sample size).

@github-project-automation github-project-automation bot moved this from Todo to Done in Foundry Jan 29, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-testing Area: testing Cmd-forge-test Command: forge test T-bug Type: bug
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants