Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Remove length check in ValenceDict #1183

Open
lilyminium opened this issue Jan 28, 2022 · 10 comments · Fixed by #1185
Open

Remove length check in ValenceDict #1183

lilyminium opened this issue Jan 28, 2022 · 10 comments · Fixed by #1185

Comments

@lilyminium
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

ValenceDict has a number of restrictions that require the length of the key to be no more than 4 atoms. Is there a real reason for this?

Describe the solution you'd like

Remove length check here:

@staticmethod
def key_transform(key):
"""Reverse tuple if first element is larger than last element."""
# Ensure key is a tuple.
key = tuple(key)
assert len(key) > 0 and len(key) < 5, "Valence keys must be at most 4 atoms"
# Reverse the key if the first element is bigger than the last.
if key[0] > key[-1]:
key = tuple(reversed(key))
return key

Refactor this to just take the forward or backward permutation. Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses __class__.

permutations = OrderedDict({refkey: 0, refkey[::-1]:1})

@staticmethod
def index_of(key, possible=None):
"""
Generates a canonical ordering of the equivalent permutations of ``key`` (equivalent rearrangements of indices)
and identifies which of those possible orderings this particular ordering is. This method is useful when
multiple SMARTS patterns might match the same atoms, but local molecular symmetry or the use of
wildcards in the SMARTS could make the matches occur in arbitrary order.
This method can be restricted to a subset of the canonical orderings, by providing
the optional ``possible`` keyword argument. If provided, the index returned by this method will be
the index of the element in ``possible`` after undergoing the same canonical sorting as above.
Parameters
----------
key : iterable of int
A valid key for ValenceDict
possible : iterable of iterable of int, optional. default=``None``
A subset of the possible orderings that this match might take.
Returns
-------
index : int
"""
assert len(key) < 4
refkey = __class__.key_transform(key)
if len(key) == 2:
permutations = OrderedDict(
{(refkey[0], refkey[1]): 0, (refkey[1], refkey[0]): 1}
)
elif len(key) == 3:
permutations = OrderedDict(
{
(refkey[0], refkey[1], refkey[2]): 0,
(refkey[2], refkey[1], refkey[0]): 1,
}
)
else:
# For a proper, only forward/backward makes sense
permutations = OrderedDict(
{
(refkey[0], refkey[1], refkey[2], refkey[3]): 0,
(refkey[3], refkey[1], refkey[2], refkey[0]): 1,
}
)
if possible is not None:
i = 0
# If the possible permutations were provided, ensure that `possible` is a SUBSET of `permutations`
assert all([p in permutations for p in possible]), (
"Possible permutations " + str(possible) + " is impossible!"
)
# TODO: Double-check whether this will generalize. It seems like this would fail if ``key``
# were in ``permutations``, but not ``possible``
for k in permutations:
if all([x == y for x, y in zip(key, k)]):
return i
if k in possible:
i += 1
else:
# If the possible permutations were NOT provided, then return the unique index of this permutation.
return permutations[key]

Describe alternatives you've considered

Additional context

@j-wags
Copy link
Member

j-wags commented Jan 29, 2022

Remove length check here:

Sounds good. I thought about this a bit and there's no need to limit it to 4 atoms. Happy to take a PR on this.

Refactor this to just take the forward or backward permutation.

I need to think more on this, because I forgot all the details of how this method is used - Is this separable from the other aspects of this issue or are they intertwined for your purposes?

Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses class.

Agree. Happy to take a PR on this.

@lilyminium
Copy link
Collaborator Author

lilyminium commented Jan 29, 2022

I need to think more on this, because I forgot all the details of how this method is used - Is this separable from the other aspects of this issue or are they intertwined for your purposes?

They're intertwined because currently the .index_of method manually specifies the following permutations:

  • {(refkey[0], refkey[1]): 0, (refkey[1], refkey[0]): 1} (2 atoms)
  • {(refkey[0], refkey[1], refkey[2]): 0, (refkey[2], refkey[1], refkey[0]): 1, } (3 atoms)
  • {(refkey[0], refkey[1], refkey[2], refkey[3]): 0, (refkey[3], refkey[1], refkey[2], refkey[0]): 1, } (4 atoms)

If you look at the indices you'll notice that they're just the forward (0) and backward (1) permutation, but manually specified so that only index lengths up until 4 are supported. Taking the generic forward/backward would be a general case of the rule.

Edit: no, I'm wrong -- and I'm not certain the dihedrals sorting makes sense. If the bonds go A-B-C-D, does it make sense to accept D-B-C-A? Does it not conflict with the .key_transform method?

@lilyminium
Copy link
Collaborator Author

.index_of appears to only be used in one place: the VirtualSiteHandler. It looks like it calls ImproperDict.index_of, but not certain. @trevorgokey is it right that the ValenceDict should order dihedrals like that?

@trevorgokey
Copy link
Collaborator

trevorgokey commented Jan 30, 2022

My opinion here is that the ValenceDict class is for MM valence forces: bonds, angles, and torsions, and limits to < 5 because these forms have exactly two symmetries and have concrete orderings. Releasing the limit check makes the dictionary 1. not a valence dictionary anymore and 2. changes the behavior of the class. One should use the SortedDict if one wants to sort an arbitrary key length. What is the use case for a ValenceDict without a bounds check?

I'm not certain the dihedrals sorting makes sense.

Yes this is a mistake and the second form should be 3-2-1-0.

Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses class

I do stuff like this because a class method would carry around uneeded object state. There is also no dependence on the cls argument in a class method in the function, but would need to be passed/serialized in certain cases if it were a class method. I personally use the transforms in my code, but not the actual class (for example, making sure the ForceBalance keys will match). Is there a "pythonic" way to achieve a similar outcome? My suggestion would be to take the key_transform out of the class, but this is an API break.

On a broader level, this implementation was written before match=once and match=all_permutations was created when I exposed the ability to pick specific virtual site particles. Since things were reduced to the two options, I am presuming this is why the above bug slid through and index_of isn't heavily used currently.

@lilyminium
Copy link
Collaborator Author

lilyminium commented Jan 30, 2022

Releasing the limit check makes the dictionary 1. not a valence dictionary anymore and 2. changes the behavior of the class. One should use the SortedDict if one wants to sort an arbitrary key length. What is the use case for a ValenceDict without a bounds check?

Sorting isn't quite the same thing as symmetry (forwards/backwards). D-A-B-C-E should match E-C-B-A-D but not A-B-C!-D!-E. As an existing example, what about CMAP torsions? Or anything that is a string of >4 atoms bonded in a chain, where both forwards and backwards matching is ok.

I do stuff like this because a class method would carry around uneeded object state. There is also no dependence on the cls argument in a class method in the function, but would need to be passed/serialized in certain cases if it were a class method. I personally use the transforms in my code, but not the actual class (for example, making sure the ForceBalance keys will match). Is there a "pythonic" way to achieve a similar outcome? My suggestion would be to take the key_transform out of the class, but this is an API break.

Sorry -- I don't follow, especially what you're saying about serialization. Could you clarify? You're accessing the object state (by this I assume you mean the class, which is technically an object in Python and an instance of type) when you access __class__. __class__ is the same thing just using the class name (here, ValenceDict). It's very similar to the cls argument in a classmethod, only more confusing, because it doesn't inherit, so you wind up with this scenario:

In [1]: class Superclass:
   ...:
   ...:     @staticmethod
   ...:     def is_called():
   ...:         print(f"called from superclass")
   ...:
   ...:     @staticmethod
   ...:     def caller():
   ...:         print(f"calling method from {__class__}")
   ...:         __class__.is_called()
   ...:
   ...:
   ...: class Subclass(Superclass):
   ...:     @staticmethod
   ...:     def is_called():
   ...:         print("called from subclass")
   ...:

In [2]: class Subclass2(Subclass):
   ...:     @staticmethod
   ...:     def caller():
   ...:         print(f"calling method from {__class__}")
   ...:         __class__.is_called()
   ...:

In [3]: Subclass.caller()
calling method from <class '__main__.Superclass'>
called from superclass

In [4]: Subclass2.caller()
calling method from <class '__main__.Subclass2'>
called from subclass

The class is necessarily defined ("carried around") whenever you call a method belonging to that class because, well, you're calling it from the class. Making it a classmethod just makes this relationship explicit. Taking .key_transform out of the class doesn't make much sense because it's a method that changes and is re-defined on each inherited subclass of TransformedDict, but is used in the same way in .index_of -- it's a good candidate for object-orientation as it is now.

@lilyminium lilyminium mentioned this issue Jan 30, 2022
5 tasks
@trevorgokey
Copy link
Collaborator

D-A-B-C-E should match E-C-B-A-D but not A-B-C!-D!-E

I don't disagree here but I don't see how it fits into the discussion. Are there 5-body valence terms that we need to support?

what about CMAP torsions

Use a TagSortedDict or implement a CMAP-specific handler.

Or anything that is a string of >4 atoms bonded in a chain, where both forwards and backwards matching is ok

Not a valence set and likely more akin to a chemical environment. Perhaps a new ChainDict or SequenceDict would be appropriate here.

is re-defined on each inherited subclass of TransformedDict

Only __key_transform__ is inherited and TransformedDict has no key_transform method.

I don't follow

In [2]: # %load static_test.py
   ...: #!/usr/bin/env python3
   ...: 
   ...: import pprint
   ...: from concurrent.futures import ProcessPoolExecutor
   ...: 
   ...: 
   ...: class A:
   ...:     class_var = 2
   ...: 
   ...:     def __init__(self):
   ...:         self.member_var = 3
   ...: 
   ...:     @staticmethod
   ...:     def s_meth(x):
   ...:         return pprint.pformat(locals())
   ...: 
   ...:     @classmethod
   ...:     def c_meth(cls, x):
   ...:         return pprint.pformat(locals())
   ...: 
   ...: 
   ...: def main():
   ...:     with ProcessPoolExecutor(1) as pool:
   ...:         print(pool.submit(A.s_meth, 5).result())
   ...:         print(pool.submit(A.c_meth, 5).result())
   ...: 
   ...: 
   ...: if __name__ == "__main__":
   ...:     main()
   ...: 
   ...: 
{'x': 5}
{'cls': <class '__main__.A'>, 'x': 5}

where cls would carry all of the class members and would need to serialized (in this case pickled) and sent to each forked process. Not a big deal here for these light weight sorting dictionaries but there's no need to pass it unnecessarily. I used __class__ simply to resolve the symbol in a general way; ValenceDict.key_transform would have worked as well.

@lilyminium
Copy link
Collaborator Author

lilyminium commented Jan 30, 2022

Are there 5-body valence terms that we need to support?

Not currently; that's why this is a feature request instead of a bug report. It seems unnecessarily restrictive to me, given that nothing in the current toolkit needs this check. I can implement my own dictionary, but when all functionality is already present in ValenceDict and the only thing stopping me is an AssertionError, it seems like it'd save users and computer time to just remove the assertion. Making a ChainDict and having ValenceDict subclass that with the assertion is definitely one solution.

where cls would carry all of the class members and would need to serialized

That's definitely a valid concern for normal staticmethods -- I was speaking of ones that refer to __class__. I'm much less experienced with processes than you are so correct me if I'm wrong, but surely that necessitates serializing the class as well?

In [2]: # %load static_test.py
   ...: #!/usr/bin/env python3
   ...:
   ...: import pprint
   ...: from concurrent.futures import ProcessPoolExecutor
   ...:
   ...: class A:
   ...:     class_var = 2
   ...:
   ...:     @staticmethod
   ...:     def s_meth():
   ...:         __class__.class_var
   ...:         return pprint.pformat(locals())
   ...:
   ...:     @classmethod
   ...:     def c_meth(cls):
   ...:         cls.class_var
   ...:         return pprint.pformat(locals())
   ...:
   ...: if __name__ == "__main__":
   ...:     with ProcessPoolExecutor(1) as pool:
   ...:         print(pool.submit(A.s_meth).result())
   ...:         print(pool.submit(A.c_meth).result())
   ...:
{'__class__': <class '__main__.A'>}
{'cls': <class '__main__.A'>}

@trevorgokey
Copy link
Collaborator

I am by no means an expert on the subject, so I wouldn't know without testing it. However I don't think that particular case applies to the code in this issue.

I see a PR has already been crafted to get this change in, so no need for me to get in the way here. Thanks for catching the bug in the ordering!

@lilyminium
Copy link
Collaborator Author

lilyminium commented Jan 31, 2022

Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses __class__.

ValenceDict and ImproperDict both have staticmethods that use __class__, so I felt my example was more representative. Again, I have nothing against staticmethods for the reasons you've mentioned -- they're a useful construct and exist for a reason. I just think that when you require the class in the method, you should switch to a classmethod to communicate the code more clearly.

refkey = __class__.key_transform(key)

@j-wags
Copy link
Member

j-wags commented Feb 10, 2022

Gotcha, I see some points on both sides but ultimately I agree with @lilyminium on all proposed changes here.

This is sorta related to a legacy of rough edges around vsites, which came to be because, when @trevorgokey and I were working on it, vsites were supposed to take like a month, and then at the 6 month point, when crazy edge cases kept popping up, and we had unilaterally made changes to the SMIRNOFF spec, and @trevorgokey had rewritten everything several times, we finally said "this is good enough", and this got out with some rough edges. So thanks to both of you for the great discussion, I think @lilyminium's correct-er on every point (except I'm giving up on trying to understand the serialization thing, so I have no opinion there)

I've approved the PR that will close this issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants