-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improve performance of Interchange.from_smirnoff
on polymers
#1122
Conversation
Ultimately I wasn't able to find a major bottleneck (i.e. something I'm seeing performance gains across the board in calling For a polymer-ish compound of increasing length, seeing closer to single-digit differences at larger molecule sizes: Source: https://gist.github.com/mattwthompson/f64b4ba936147492b1b43db5b28f3e55 And on a large protein (run this in Jupyter): %%timeit
ForceField(
"ff14sb_off_impropers_0.0.3.offxml",
).create_interchange(
Topology.from_pdb(
"../proteinbenchmark/proteinbenchmark/data/pdbs/hewl-1E8L-model-1.pdb",
),
)
# fcf439975b2a5283228b4d10c55d63c360820d90: 25.6 s ± 2.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 5d34566af0bdd34392915a4bb39a7f0d00cb3c4e: 22.6 s ± 1.21 s per loop (mean ± std. dev. of 7 runs, 1 loop each) |
This drops the runtime of |
@mattwthompson Did my own quick benchmark with a profiler to get some more details on what's causing a bottleneck, code and results linked. For reproducibility, I ran the linked script with interchange v0.4.0 and toolkit v0.16.7. I ran with 1,200 repeat units but changed the repeat unit identity from "CO" to "CCO" (i.e. to PEG, as [CO]n doesn't correspond to any real polymer to the best of my knowledge). Moving down the call stack, looks like the highest-cost primitive functions are:
Make of that what you will, I haven't dug thru the source code deeply enough to know why those calls in particular dominate runtime, but I suspect this'll at least help direct effort towards the key 20% of bugs. I also have access to ~1,400 diverse polymer chemistries which I've run thru Interchange as part of an unrelated collaboration; the individual chains are only a few hundred atoms, but are packed into 10k atom melts before porting thru Interchange. If it would be of use, I can get back with Interchange output runtimes for these chemistries sometime next week after making some tweaks to my pipeline (viz incorporating profilers into a Signac-based workflow). Hope I could be of some help! |
@timbernat this PR aims to streamline the bottleneck which I believe is the cause of this poor performance. Could you install against this branch ( |
@mattwthompson Apologies for the long turnaround, see here for updated profile times (same code as prior but with PR-version of Interchange). Runtimes are pretty similar, with components from _nonbonded now dominating much of the runtime. Sorry I couldn't be of more help on this front yet, I'll be consolidating some concurrent projects following the holidays which should hopefully give me some more time on the side. Let me know if I can contribute anything else here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm unscientifically seeing a small speed improvement locally as well.
Thanks @Yoshanuikabundi and @timbernat! |
Description
Cache some parameter lookups, which improves speed of some polymer simulations
Checklist