-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Verify Bitwise Reproducibility of RBFE Endstate Trajectories #1506
base: master
Are you sure you want to change the base?
Conversation
* Writes RBFE data to the artifact directory
tests/test_examples.py
Outdated
endstate_0_hash = hash_file(leg_dir / "lambda0_traj.npz") | ||
endstate_1_hash = hash_file(leg_dir / "lambda1_traj.npz") | ||
assert endstate_0_hash == endstate_hashes[leg][0] and endstate_1_hash == endstate_hashes[leg][1], ( | ||
f"{endstate_0_hash} != {endstate_hashes[leg][0]} and/or {endstate_1_hash} != {endstate_hashes[leg][1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably don't need to repeat the hashes in the assert message, since pytest should print the diff anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to repeat the hashes is that the first assertion might fail and pytest only prints the first hash. But if the first hash changes, the second likely changes. Probably an argument for hashing a set of files, right than the two endstates individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, makes sense to avoid the need to run the test twice to update both hashes.
I suppose comparing tuples in the assertion would also print both hashes on failure, as desired?
assert (endstate_0_hash, endstate_1_hash) == endstate_hashes[leg]
tests/test_examples.py
Outdated
n_windows=n_windows, | ||
# Use simple charges to avoid os-dependent charge differences | ||
forcefield="smirnoff_1_1_0_sc.py", | ||
# Can generate hashes from CI artifacts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth documenting the process for updating one of these hashes. E.g. when we intentionally break bitwise reproducibility, can we just copy the new hash printed by the failed CI job here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1000
tests/common.py
Outdated
@@ -26,6 +26,8 @@ | |||
from timemachine.utils import path_to_internal_file | |||
|
|||
HILBERT_GRID_DIM = 128 | |||
# Directory to write files to that will be stored as artifacts in CI. | |||
ARTIFACT_DIR_NAME = "pytest-artifacts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit to this being an env var? (e.g. so it could be read by both the CI configuration and the pytest process?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make a difference, but made it an env variable in 23b23a3
n_windows=n_windows, | ||
# Use simple charges to avoid os-dependent charge differences | ||
forcefield="smirnoff_1_1_0_sc.py", | ||
output_dir=f"{ARTIFACT_DIR_NAME}/rbfe_{mol_a}_{mol_b}_{leg}_{seed}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does each CI run get a fresh artifact directory? (not sure how this works on the gitlab side...)
If not, should we add an ID or timestamp to the path to avoid races with concurrent CI jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each CI run gets a fresh artifact. The coverage report is similarly reported currently.
|
assert len(traj_data["boxes"]) == n_frames | ||
|
||
def verify_endstate_hashes(output_dir: Path): | ||
leg_dir = output_dir / leg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to just recursively hash everything in the subdirectory? (vs. the current behavior of only hashing the trajectory npz file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think selectively hashing things based on #1506 (comment) makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to hash the npz files (don't expect the pickles or plots to be reproducible) in 673dd18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one benefit I see of separate hashes is being able to immediately tell whether it was only the dGs that changed (versus e.g. trajectories and dGs). I think the change would be relatively simple (e.g. assert on a tuple of hashes matching instead of effectively comparing a hash of the whole tuple)
I think we should add some more documentation about the intended behavior/consistency: Do we:
|
|
* Enforces that dG predictions match * Switch to default forcefield to match standard behavior
pytest-archive
directory to CI to get out artifacts, in case we can't emulate the CI (which I can't seem to do for RBFE, while I can for the Buckyball test case)