Improve performance of `Interchange.from_smirnoff` on polymers #1122

mattwthompson · 2024-12-09T22:59:14Z

Description

Cache some parameter lookups, which improves speed of some polymer simulations

Checklist

Add tests
Lint
Update docstrings

codecov · 2024-12-10T14:09:08Z

Codecov Report

Attention: Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Project coverage is 93.45%. Comparing base (9460db5) to head (adf40f3).
Report is 6 commits behind head on main.

Additional details and impacted files

mattwthompson · 2024-12-10T14:56:54Z

Ultimately I wasn't able to find a major bottleneck (i.e. something $O(n^2)$ accidentally snuck into things) with the amount of time I had to look. Still, I found a few things that were unnecessarily slow and could be fixes quickly.

I'm seeing performance gains across the board in calling ForceField.create_interchange on systems with large (> 100 heavy atoms) molecule(s).

For a polymer-ish compound of increasing length, seeing closer to single-digit differences at larger molecule sizes:

Source: https://gist.github.com/mattwthompson/f64b4ba936147492b1b43db5b28f3e55

And on a large protein (run this in Jupyter):

%%timeit
ForceField(
    "ff14sb_off_impropers_0.0.3.offxml",
).create_interchange(
    Topology.from_pdb(
        "../proteinbenchmark/proteinbenchmark/data/pdbs/hewl-1E8L-model-1.pdb",
    ),
)

# fcf439975b2a5283228b4d10c55d63c360820d90: 25.6 s ± 2.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 5d34566af0bdd34392915a4bb39a7f0d00cb3c4e: 22.6 s ± 1.21 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

mattwthompson · 2024-12-10T22:50:24Z

This drops the runtime of sage_ff14sb.create_interchange(top) in the toolkit showcase from ~75 to ~65 seconds

timbernat · 2024-12-13T04:06:04Z

@mattwthompson Did my own quick benchmark with a profiler to get some more details on what's causing a bottleneck, code and results linked. For reproducibility, I ran the linked script with interchange v0.4.0 and toolkit v0.16.7. I ran with 1,200 repeat units but changed the repeat unit identity from "CO" to "CCO" (i.e. to PEG, as [CO]n doesn't correspond to any real polymer to the best of my knowledge).

Moving down the call stack, looks like the highest-cost primitive functions are:

In Interchange:
- smirnoff._valence.store_potentials()
- smirnoff._create._propers()
From Python stdlib
- builtin list.index() (individually quick, but called >180k times altogether!)
- selectors.EpollSelector.select() (have literally no clue what this does/why it's called)

Make of that what you will, I haven't dug thru the source code deeply enough to know why those calls in particular dominate runtime, but I suspect this'll at least help direct effort towards the key 20% of bugs.

I also have access to ~1,400 diverse polymer chemistries which I've run thru Interchange as part of an unrelated collaboration; the individual chains are only a few hundred atoms, but are packed into 10k atom melts before porting thru Interchange. If it would be of use, I can get back with Interchange output runtimes for these chemistries sometime next week after making some tweaks to my pipeline (viz incorporating profilers into a Signac-based workflow). Hope I could be of some help!

mattwthompson · 2024-12-13T16:23:14Z

@timbernat this PR aims to streamline the bottleneck which I believe is the cause of this poor performance. Could you install against this branch (pip install git+https://github.com/openforcefield/openff-interchange.git@cache-parameter-lookups) and compare timings for a few polymers?

timbernat · 2024-12-20T00:41:41Z

@mattwthompson Apologies for the long turnaround, see here for updated profile times (same code as prior but with PR-version of Interchange). Runtimes are pretty similar, with components from _nonbonded now dominating much of the runtime.

Sorry I couldn't be of more help on this front yet, I'll be consolidating some concurrent projects following the holidays which should hopefully give me some more time on the side. Let me know if I can contribute anything else here!

Yoshanuikabundi

LGTM! I'm unscientifically seeing a small speed improvement locally as well.

…kups

mattwthompson · 2025-01-16T15:04:15Z

Thanks @Yoshanuikabundi and @timbernat!

mattwthompson added 2 commits December 9, 2024 16:55

PERF: Cache some parameter lookups

0899d13

FIX: Fix caching idivf

5d34566

mattwthompson marked this pull request as ready for review December 10, 2024 15:01

mattwthompson requested a review from Yoshanuikabundi December 10, 2024 15:01

PERF: Cache some charge increment calculations

f8e98e4

jameseastwood assigned Yoshanuikabundi Jan 15, 2025

Yoshanuikabundi approved these changes Jan 16, 2025

View reviewed changes

mattwthompson added 2 commits January 16, 2025 08:33

DOC: Update release history

fcbc6a2

Merge remote-tracking branch 'upstream/main' into cache-parameter-loo…

adf40f3

…kups

mattwthompson merged commit b623003 into main Jan 16, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `Interchange.from_smirnoff` on polymers #1122

Improve performance of `Interchange.from_smirnoff` on polymers #1122

mattwthompson commented Dec 9, 2024 •

edited

Loading

codecov bot commented Dec 10, 2024 •

edited

Loading

mattwthompson commented Dec 10, 2024

mattwthompson commented Dec 10, 2024

timbernat commented Dec 13, 2024 •

edited

Loading

mattwthompson commented Dec 13, 2024

timbernat commented Dec 20, 2024

Yoshanuikabundi left a comment

mattwthompson commented Jan 16, 2025

Improve performance of Interchange.from_smirnoff on polymers #1122

Improve performance of Interchange.from_smirnoff on polymers #1122

Conversation

mattwthompson commented Dec 9, 2024 • edited Loading

Description

Checklist

codecov bot commented Dec 10, 2024 • edited Loading

Codecov Report

mattwthompson commented Dec 10, 2024

mattwthompson commented Dec 10, 2024

timbernat commented Dec 13, 2024 • edited Loading

mattwthompson commented Dec 13, 2024

timbernat commented Dec 20, 2024

Yoshanuikabundi left a comment

Choose a reason for hiding this comment

mattwthompson commented Jan 16, 2025

Improve performance of `Interchange.from_smirnoff` on polymers #1122

Improve performance of `Interchange.from_smirnoff` on polymers #1122

mattwthompson commented Dec 9, 2024 •

edited

Loading

codecov bot commented Dec 10, 2024 •

edited

Loading

timbernat commented Dec 13, 2024 •

edited

Loading