Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506

badisa · 2025-02-18T19:26:56Z

Adds a pytest-archive directory to CI to get out artifacts, in case we can't emulate the CI (which I can't seem to do for RBFE, while I can for the Buckyball test case)
Adds hashes of the endstates of RBFE runs so we can identify when simulations changes. Decided on hashes since the file sizes are large. This does not detect changes to the dGs.

* Writes RBFE data to the artifact directory

mcwitt · 2025-02-20T00:32:15Z

tests/test_examples.py

+            endstate_0_hash = hash_file(leg_dir / "lambda0_traj.npz")
+            endstate_1_hash = hash_file(leg_dir / "lambda1_traj.npz")
+            assert endstate_0_hash == endstate_hashes[leg][0] and endstate_1_hash == endstate_hashes[leg][1], (
+                f"{endstate_0_hash} != {endstate_hashes[leg][0]} and/or {endstate_1_hash} != {endstate_hashes[leg][1]}"


nit: probably don't need to repeat the hashes in the assert message, since pytest should print the diff anyway?

The reason to repeat the hashes is that the first assertion might fail and pytest only prints the first hash. But if the first hash changes, the second likely changes. Probably an argument for hashing a set of files, right than the two endstates individually.

Ah, I see, makes sense to avoid the need to run the test twice to update both hashes.

I suppose comparing tuples in the assertion would also print both hashes on failure, as desired?

assert (endstate_0_hash, endstate_1_hash) == endstate_hashes[leg]

mcwitt · 2025-02-20T00:37:03Z

tests/test_examples.py

-                n_windows=n_windows,
-                # Use simple charges to avoid os-dependent charge differences
-                forcefield="smirnoff_1_1_0_sc.py",
+    # Can generate hashes from CI artifacts


Might be worth documenting the process for updating one of these hashes. E.g. when we intentionally break bitwise reproducibility, can we just copy the new hash printed by the failed CI job here?

mcwitt · 2025-02-20T00:38:31Z

tests/common.py

@@ -26,6 +26,8 @@
 from timemachine.utils import path_to_internal_file

 HILBERT_GRID_DIM = 128
+# Directory to write files to that will be stored as artifacts in CI.
+ARTIFACT_DIR_NAME = "pytest-artifacts"


Is there any benefit to this being an env var? (e.g. so it could be read by both the CI configuration and the pytest process?)

Doesn't make a difference, but made it an env variable in 23b23a3

mcwitt · 2025-02-20T00:45:04Z

tests/test_examples.py

+            n_windows=n_windows,
+            # Use simple charges to avoid os-dependent charge differences
+            forcefield="smirnoff_1_1_0_sc.py",
+            output_dir=f"{ARTIFACT_DIR_NAME}/rbfe_{mol_a}_{mol_b}_{leg}_{seed}",


Does each CI run get a fresh artifact directory? (not sure how this works on the gitlab side...)

If not, should we add an ID or timestamp to the path to avoid races with concurrent CI jobs?

Yes, each CI run gets a fresh artifact. The coverage report is similarly reported currently.

proteneer · 2025-02-20T14:21:27Z

Can you add a comment in the code to document the right way to update the artifacts?
Should the water sampling tests also be updated to use the pytest-artifact mechanism?

proteneer · 2025-02-20T14:25:12Z

tests/test_examples.py

+                assert len(traj_data["boxes"]) == n_frames
+
+        def verify_endstate_hashes(output_dir: Path):
+            leg_dir = output_dir / leg


any reason not to just recursively hash everything in the subdirectory? (vs. the current behavior of only hashing the trajectory npz file)

Think selectively hashing things based on #1506 (comment) makes sense

Changed to hash the npz files (don't expect the pickles or plots to be reproducible) in 673dd18

nit: one benefit I see of separate hashes is being able to immediately tell whether it was only the dGs that changed (versus e.g. trajectories and dGs). I think the change would be relatively simple (e.g. assert on a tuple of hashes matching instead of effectively comparing a hash of the whole tuple)

proteneer · 2025-02-20T14:28:25Z

I think we should add some more documentation about the intended behavior/consistency:

Do we:
0) expect inputs to the trajectories to be bitwise identical? (assumed yes)

expect trajectories to be bitwise identical? (assumed yes)
expect estimated dGs and dG_errs to be bitwise identical?
expect analysis files to be bitwise identical?

badisa · 2025-02-20T16:07:28Z

I think we should add some more documentation about the intended behavior/consistency:

Do we: 0) expect inputs to the trajectories to be bitwise identical? (assumed yes)

expect trajectories to be bitwise identical? (assumed yes)

expect estimated dGs and dG_errs to be bitwise identical?

expect analysis files to be bitwise identical?

Yes
Not in this PR currently, but could add this
I would not expect matplotlib plots to bitwise identical, nor do I think pickles would be informative to hash.

* Enforces that dG predictions match * Switch to default forcefield to match standard behavior

badisa added 2 commits February 18, 2025 08:44

Adds artifact directory to CI

8aafccb

* Writes RBFE data to the artifact directory

Add hash checks of the npz files

a2762af

badisa changed the title ~~Task/bitwise determinism for rbfe~~ Verify Bitwise Reproducibility of RBFE Endstate Trajectories Feb 18, 2025

Merge branch 'master' into task/bitwise-determinism-for-rbfe

491ff31

badisa marked this pull request as ready for review February 19, 2025 14:49

badisa requested review from mcwitt and proteneer February 19, 2025 14:49

mcwitt reviewed Feb 20, 2025

View reviewed changes

proteneer reviewed Feb 20, 2025

View reviewed changes

badisa and others added 3 commits February 20, 2025 09:07

Merge branch 'master' into task/bitwise-determinism-for-rbfe

c338d46

PR Feedback: Use environment variable for output dir

23b23a3

PR Feedback: Hash directory results

673dd18

* Enforces that dG predictions match * Switch to default forcefield to match standard behavior

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506

Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506

badisa commented Feb 18, 2025

mcwitt Feb 20, 2025

badisa Feb 20, 2025

mcwitt Feb 20, 2025 •

edited

Loading

mcwitt Feb 20, 2025

proteneer Feb 20, 2025

mcwitt Feb 20, 2025

badisa Feb 21, 2025

mcwitt Feb 20, 2025

badisa Feb 20, 2025

proteneer commented Feb 20, 2025

proteneer Feb 20, 2025

badisa Feb 20, 2025

badisa Feb 21, 2025

mcwitt Feb 21, 2025 •

edited

Loading

proteneer commented Feb 20, 2025

badisa commented Feb 20, 2025

Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506

Are you sure you want to change the base?

Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506

Conversation

badisa commented Feb 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcwitt Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proteneer commented Feb 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcwitt Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

proteneer commented Feb 20, 2025

badisa commented Feb 20, 2025

mcwitt Feb 20, 2025 •

edited

Loading

mcwitt Feb 21, 2025 •

edited

Loading