Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

badisa
Copy link
Collaborator

@badisa badisa commented Feb 18, 2025

  • Adds a pytest-archive directory to CI to get out artifacts, in case we can't emulate the CI (which I can't seem to do for RBFE, while I can for the Buckyball test case)
  • Adds hashes of the endstates of RBFE runs so we can identify when simulations changes. Decided on hashes since the file sizes are large. This does not detect changes to the dGs.

@badisa badisa changed the title Task/bitwise determinism for rbfe Verify Bitwise Reproducibility of RBFE Endstate Trajectories Feb 18, 2025
@badisa badisa marked this pull request as ready for review February 19, 2025 14:49
@badisa badisa requested review from mcwitt and proteneer February 19, 2025 14:49
endstate_0_hash = hash_file(leg_dir / "lambda0_traj.npz")
endstate_1_hash = hash_file(leg_dir / "lambda1_traj.npz")
assert endstate_0_hash == endstate_hashes[leg][0] and endstate_1_hash == endstate_hashes[leg][1], (
f"{endstate_0_hash} != {endstate_hashes[leg][0]} and/or {endstate_1_hash} != {endstate_hashes[leg][1]}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably don't need to repeat the hashes in the assert message, since pytest should print the diff anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to repeat the hashes is that the first assertion might fail and pytest only prints the first hash. But if the first hash changes, the second likely changes. Probably an argument for hashing a set of files, right than the two endstates individually.

Copy link
Collaborator

@mcwitt mcwitt Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, makes sense to avoid the need to run the test twice to update both hashes.

I suppose comparing tuples in the assertion would also print both hashes on failure, as desired?

assert (endstate_0_hash, endstate_1_hash) == endstate_hashes[leg]

n_windows=n_windows,
# Use simple charges to avoid os-dependent charge differences
forcefield="smirnoff_1_1_0_sc.py",
# Can generate hashes from CI artifacts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth documenting the process for updating one of these hashes. E.g. when we intentionally break bitwise reproducibility, can we just copy the new hash printed by the failed CI job here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000

tests/common.py Outdated
@@ -26,6 +26,8 @@
from timemachine.utils import path_to_internal_file

HILBERT_GRID_DIM = 128
# Directory to write files to that will be stored as artifacts in CI.
ARTIFACT_DIR_NAME = "pytest-artifacts"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to this being an env var? (e.g. so it could be read by both the CI configuration and the pytest process?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make a difference, but made it an env variable in 23b23a3

n_windows=n_windows,
# Use simple charges to avoid os-dependent charge differences
forcefield="smirnoff_1_1_0_sc.py",
output_dir=f"{ARTIFACT_DIR_NAME}/rbfe_{mol_a}_{mol_b}_{leg}_{seed}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does each CI run get a fresh artifact directory? (not sure how this works on the gitlab side...)

If not, should we add an ID or timestamp to the path to avoid races with concurrent CI jobs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each CI run gets a fresh artifact. The coverage report is similarly reported currently.

@proteneer
Copy link
Owner

  1. Can you add a comment in the code to document the right way to update the artifacts?
  2. Should the water sampling tests also be updated to use the pytest-artifact mechanism?

assert len(traj_data["boxes"]) == n_frames

def verify_endstate_hashes(output_dir: Path):
leg_dir = output_dir / leg
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to just recursively hash everything in the subdirectory? (vs. the current behavior of only hashing the trajectory npz file)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think selectively hashing things based on #1506 (comment) makes sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to hash the npz files (don't expect the pickles or plots to be reproducible) in 673dd18

Copy link
Collaborator

@mcwitt mcwitt Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one benefit I see of separate hashes is being able to immediately tell whether it was only the dGs that changed (versus e.g. trajectories and dGs). I think the change would be relatively simple (e.g. assert on a tuple of hashes matching instead of effectively comparing a hash of the whole tuple)

@proteneer
Copy link
Owner

I think we should add some more documentation about the intended behavior/consistency:

Do we:
0) expect inputs to the trajectories to be bitwise identical? (assumed yes)

  1. expect trajectories to be bitwise identical? (assumed yes)
  2. expect estimated dGs and dG_errs to be bitwise identical?
  3. expect analysis files to be bitwise identical?

@badisa
Copy link
Collaborator Author

badisa commented Feb 20, 2025

I think we should add some more documentation about the intended behavior/consistency:

Do we: 0) expect inputs to the trajectories to be bitwise identical? (assumed yes)

  1. expect trajectories to be bitwise identical? (assumed yes)
  2. expect estimated dGs and dG_errs to be bitwise identical?
  3. expect analysis files to be bitwise identical?
  1. Yes
  2. Not in this PR currently, but could add this
  3. I would not expect matplotlib plots to bitwise identical, nor do I think pickles would be informative to hash.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants