Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve performance of Interchange.from_smirnoff on polymers #1122

Merged
merged 5 commits into from
Jan 16, 2025

Conversation

mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Dec 9, 2024

Description

Cache some parameter lookups, which improves speed of some polymer simulations

Checklist

  • Add tests
  • Lint
  • Update docstrings

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Project coverage is 93.45%. Comparing base (9460db5) to head (adf40f3).
Report is 6 commits behind head on main.

Additional details and impacted files

@mattwthompson
Copy link
Member Author

Ultimately I wasn't able to find a major bottleneck (i.e. something $O(n^2)$ accidentally snuck into things) with the amount of time I had to look. Still, I found a few things that were unnecessarily slow and could be fixes quickly.

I'm seeing performance gains across the board in calling ForceField.create_interchange on systems with large (> 100 heavy atoms) molecule(s).

For a polymer-ish compound of increasing length, seeing closer to single-digit differences at larger molecule sizes:

image

Source: https://gist.github.com/mattwthompson/f64b4ba936147492b1b43db5b28f3e55

And on a large protein (run this in Jupyter):

%%timeit
ForceField(
    "ff14sb_off_impropers_0.0.3.offxml",
).create_interchange(
    Topology.from_pdb(
        "../proteinbenchmark/proteinbenchmark/data/pdbs/hewl-1E8L-model-1.pdb",
    ),
)

# fcf439975b2a5283228b4d10c55d63c360820d90: 25.6 s ± 2.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 5d34566af0bdd34392915a4bb39a7f0d00cb3c4e: 22.6 s ± 1.21 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@mattwthompson mattwthompson marked this pull request as ready for review December 10, 2024 15:01
@mattwthompson
Copy link
Member Author

This drops the runtime of sage_ff14sb.create_interchange(top) in the toolkit showcase from ~75 to ~65 seconds

@timbernat
Copy link
Contributor

timbernat commented Dec 13, 2024

@mattwthompson Did my own quick benchmark with a profiler to get some more details on what's causing a bottleneck, code and results linked. For reproducibility, I ran the linked script with interchange v0.4.0 and toolkit v0.16.7. I ran with 1,200 repeat units but changed the repeat unit identity from "CO" to "CCO" (i.e. to PEG, as [CO]n doesn't correspond to any real polymer to the best of my knowledge).

Moving down the call stack, looks like the highest-cost primitive functions are:

Make of that what you will, I haven't dug thru the source code deeply enough to know why those calls in particular dominate runtime, but I suspect this'll at least help direct effort towards the key 20% of bugs.

I also have access to ~1,400 diverse polymer chemistries which I've run thru Interchange as part of an unrelated collaboration; the individual chains are only a few hundred atoms, but are packed into 10k atom melts before porting thru Interchange. If it would be of use, I can get back with Interchange output runtimes for these chemistries sometime next week after making some tweaks to my pipeline (viz incorporating profilers into a Signac-based workflow). Hope I could be of some help!

@mattwthompson
Copy link
Member Author

@timbernat this PR aims to streamline the bottleneck which I believe is the cause of this poor performance. Could you install against this branch (pip install git+https://github.com/openforcefield/openff-interchange.git@cache-parameter-lookups) and compare timings for a few polymers?

@timbernat
Copy link
Contributor

@mattwthompson Apologies for the long turnaround, see here for updated profile times (same code as prior but with PR-version of Interchange). Runtimes are pretty similar, with components from _nonbonded now dominating much of the runtime.

Sorry I couldn't be of more help on this front yet, I'll be consolidating some concurrent projects following the holidays which should hopefully give me some more time on the side. Let me know if I can contribute anything else here!

Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm unscientifically seeing a small speed improvement locally as well.

@mattwthompson
Copy link
Member Author

Thanks @Yoshanuikabundi and @timbernat!

@mattwthompson mattwthompson merged commit b623003 into main Jan 16, 2025
23 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants