Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cache some variables during MSA featurization #288

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

amorehead
Copy link
Contributor

  • Caches some variables during MSA featurization

for idx, (chemtype, residue_index) in enumerate(
zip(chain_chemtype, chain_residue_index)
):
is_polymer = chemtype < ligand_chemtype_index
is_ligand = not is_polymer

chem_residue_constants = get_residue_constants(res_chem_index=chemtype)
if chemtype not in chemtype_constants_cache:
chemtype_constants_cache[chemtype] = get_residue_constants(res_chem_index=chemtype)
Copy link
Owner

@lucidrains lucidrains Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is get_residue_constants an expensive function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to run some more experiments, but caching its execution for each token in a sequence seems to have dropped the runtime of pdb_input_to_atom_input from minutes to seconds. Fingers crossed this pattern holds for other structures

Copy link
Contributor Author

@amorehead amorehead Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, it should be a O(1) complexity function, since it does a simple lookup based on integers or strings and returns a static Python Module class instance. However, the make_msa_features function is in total a O(C * M * S) complexity function, where C is the number of chains in a structure (e.g., 3), M is the number of sequences in an MSA (e.g., 16k), and S is the number of tokens in each MSA sequence (e.g., 200).

@lucidrains lucidrains merged commit ea41682 into lucidrains:main Sep 23, 2024
11 checks passed
@amorehead amorehead deleted the patch-1 branch September 23, 2024 19:51
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants