Remove length check in ValenceDict #1183

lilyminium · 2022-01-28T00:57:11Z

Is your feature request related to a problem? Please describe.

ValenceDict has a number of restrictions that require the length of the key to be no more than 4 atoms. Is there a real reason for this?

Describe the solution you'd like

Remove length check here:

openff-toolkit/openff/toolkit/topology/topology.py

Lines 94 to 103 in c32527b

    
           @staticmethod 
        
           def key_transform(key): 
        
               """Reverse tuple if first element is larger than last element.""" 
        
               # Ensure key is a tuple. 
        
               key = tuple(key) 
        
               assert len(key) > 0 and len(key) < 5, "Valence keys must be at most 4 atoms" 
        
               # Reverse the key if the first element is bigger than the last. 
        
               if key[0] > key[-1]: 
        
                   key = tuple(reversed(key)) 
        
               return key

Refactor this to just take the forward or backward permutation. Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses __class__.

permutations = OrderedDict({refkey: 0, refkey[::-1]:1})

openff-toolkit/openff/toolkit/topology/topology.py

Lines 105 to 165 in c32527b

    
               @staticmethod 
        
               def index_of(key, possible=None): 
        
                   """ 
        
                   Generates a canonical ordering of the equivalent permutations of ``key`` (equivalent rearrangements of indices) 
        
                   and identifies which of those possible orderings this particular ordering is. This method is useful when 
        
                   multiple SMARTS patterns might match the same atoms, but local molecular symmetry or the use of 
        
                   wildcards in the SMARTS could make the matches occur in arbitrary order. 
        
                   This method can be restricted to a subset of the canonical orderings, by providing 
        
                   the optional ``possible`` keyword argument. If provided, the index returned by this method will be 
        
                   the index of the element in ``possible`` after undergoing the same canonical sorting as above. 
        
                   Parameters 
        
                   ---------- 
        
                   key : iterable of int 
        
                       A valid key for ValenceDict 
        
                   possible : iterable of iterable of int, optional. default=``None`` 
        
                       A subset of the possible orderings that this match might take. 
        
                   Returns 
        
                   ------- 
        
                   index : int 
        
                   """ 
        
                   assert len(key) < 4 
        
                   refkey = __class__.key_transform(key) 
        
                   if len(key) == 2: 
        
                       permutations = OrderedDict( 
        
                           {(refkey[0], refkey[1]): 0, (refkey[1], refkey[0]): 1} 
        
                       ) 
        
                   elif len(key) == 3: 
        
                       permutations = OrderedDict( 
        
                           { 
        
                               (refkey[0], refkey[1], refkey[2]): 0, 
        
                               (refkey[2], refkey[1], refkey[0]): 1, 
        
                           } 
        
                       ) 
        
                   else: 
        
                       # For a proper, only forward/backward makes sense 
        
                       permutations = OrderedDict( 
        
                           { 
        
                               (refkey[0], refkey[1], refkey[2], refkey[3]): 0, 
        
                               (refkey[3], refkey[1], refkey[2], refkey[0]): 1, 
        
                           } 
        
                       ) 
        
                   if possible is not None: 
        
                       i = 0 
        
                       # If the possible permutations were provided, ensure that `possible` is a SUBSET of `permutations` 
        
                       assert all([p in permutations for p in possible]), ( 
        
                           "Possible permutations " + str(possible) + " is impossible!" 
        
                       ) 
        
                       # TODO: Double-check whether this will generalize. It seems like this would fail if ``key`` 
        
                       #       were in ``permutations``, but not ``possible`` 
        
                       for k in permutations: 
        
                           if all([x == y for x, y in zip(key, k)]): 
        
                               return i 
        
                           if k in possible: 
        
                               i += 1 
        
                   else: 
        
                       # If the possible permutations were NOT provided, then return the unique index of this permutation. 
        
                       return permutations[key]

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

j-wags · 2022-01-29T20:26:12Z

Remove length check here:

Sounds good. I thought about this a bit and there's no need to limit it to 4 atoms. Happy to take a PR on this.

Refactor this to just take the forward or backward permutation.

I need to think more on this, because I forgot all the details of how this method is used - Is this separable from the other aspects of this issue or are they intertwined for your purposes?

Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses class.

Agree. Happy to take a PR on this.

lilyminium · 2022-01-29T21:54:39Z

I need to think more on this, because I forgot all the details of how this method is used - Is this separable from the other aspects of this issue or are they intertwined for your purposes?

They're intertwined because currently the .index_of method manually specifies the following permutations:

{(refkey[0], refkey[1]): 0, (refkey[1], refkey[0]): 1} (2 atoms)
{(refkey[0], refkey[1], refkey[2]): 0, (refkey[2], refkey[1], refkey[0]): 1, } (3 atoms)
{(refkey[0], refkey[1], refkey[2], refkey[3]): 0, (refkey[3], refkey[1], refkey[2], refkey[0]): 1, } (4 atoms)

If you look at the indices you'll notice that they're just the forward (0) and backward (1) permutation, but manually specified so that only index lengths up until 4 are supported. Taking the generic forward/backward would be a general case of the rule.

Edit: no, I'm wrong -- and I'm not certain the dihedrals sorting makes sense. If the bonds go A-B-C-D, does it make sense to accept D-B-C-A? Does it not conflict with the .key_transform method?

lilyminium · 2022-01-29T22:05:27Z

.index_of appears to only be used in one place: the VirtualSiteHandler. It looks like it calls ImproperDict.index_of, but not certain. @trevorgokey is it right that the ValenceDict should order dihedrals like that?

trevorgokey · 2022-01-30T00:14:50Z

My opinion here is that the ValenceDict class is for MM valence forces: bonds, angles, and torsions, and limits to < 5 because these forms have exactly two symmetries and have concrete orderings. Releasing the limit check makes the dictionary 1. not a valence dictionary anymore and 2. changes the behavior of the class. One should use the SortedDict if one wants to sort an arbitrary key length. What is the use case for a ValenceDict without a bounds check?

I'm not certain the dihedrals sorting makes sense.

Yes this is a mistake and the second form should be 3-2-1-0.

Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses class

I do stuff like this because a class method would carry around uneeded object state. There is also no dependence on the cls argument in a class method in the function, but would need to be passed/serialized in certain cases if it were a class method. I personally use the transforms in my code, but not the actual class (for example, making sure the ForceBalance keys will match). Is there a "pythonic" way to achieve a similar outcome? My suggestion would be to take the key_transform out of the class, but this is an API break.

On a broader level, this implementation was written before match=once and match=all_permutations was created when I exposed the ability to pick specific virtual site particles. Since things were reduced to the two options, I am presuming this is why the above bug slid through and index_of isn't heavily used currently.

lilyminium · 2022-01-30T01:31:57Z

Releasing the limit check makes the dictionary 1. not a valence dictionary anymore and 2. changes the behavior of the class. One should use the SortedDict if one wants to sort an arbitrary key length. What is the use case for a ValenceDict without a bounds check?

Sorting isn't quite the same thing as symmetry (forwards/backwards). D-A-B-C-E should match E-C-B-A-D but not A-B-C!-D!-E. As an existing example, what about CMAP torsions? Or anything that is a string of >4 atoms bonded in a chain, where both forwards and backwards matching is ok.

I do stuff like this because a class method would carry around uneeded object state. There is also no dependence on the cls argument in a class method in the function, but would need to be passed/serialized in certain cases if it were a class method. I personally use the transforms in my code, but not the actual class (for example, making sure the ForceBalance keys will match). Is there a "pythonic" way to achieve a similar outcome? My suggestion would be to take the key_transform out of the class, but this is an API break.

Sorry -- I don't follow, especially what you're saying about serialization. Could you clarify? You're accessing the object state (by this I assume you mean the class, which is technically an object in Python and an instance of type) when you access __class__. __class__ is the same thing just using the class name (here, ValenceDict). It's very similar to the cls argument in a classmethod, only more confusing, because it doesn't inherit, so you wind up with this scenario:

In [1]: class Superclass:
   ...:
   ...:     @staticmethod
   ...:     def is_called():
   ...:         print(f"called from superclass")
   ...:
   ...:     @staticmethod
   ...:     def caller():
   ...:         print(f"calling method from {__class__}")
   ...:         __class__.is_called()
   ...:
   ...:
   ...: class Subclass(Superclass):
   ...:     @staticmethod
   ...:     def is_called():
   ...:         print("called from subclass")
   ...:

In [2]: class Subclass2(Subclass):
   ...:     @staticmethod
   ...:     def caller():
   ...:         print(f"calling method from {__class__}")
   ...:         __class__.is_called()
   ...:

In [3]: Subclass.caller()
calling method from <class '__main__.Superclass'>
called from superclass

In [4]: Subclass2.caller()
calling method from <class '__main__.Subclass2'>
called from subclass

The class is necessarily defined ("carried around") whenever you call a method belonging to that class because, well, you're calling it from the class. Making it a classmethod just makes this relationship explicit. Taking .key_transform out of the class doesn't make much sense because it's a method that changes and is re-defined on each inherited subclass of TransformedDict, but is used in the same way in .index_of -- it's a good candidate for object-orientation as it is now.

trevorgokey · 2022-01-30T02:53:48Z

D-A-B-C-E should match E-C-B-A-D but not A-B-C!-D!-E

I don't disagree here but I don't see how it fits into the discussion. Are there 5-body valence terms that we need to support?

what about CMAP torsions

Use a TagSortedDict or implement a CMAP-specific handler.

Or anything that is a string of >4 atoms bonded in a chain, where both forwards and backwards matching is ok

Not a valence set and likely more akin to a chemical environment. Perhaps a new ChainDict or SequenceDict would be appropriate here.

is re-defined on each inherited subclass of TransformedDict

Only __key_transform__ is inherited and TransformedDict has no key_transform method.

I don't follow

In [2]: # %load static_test.py
   ...: #!/usr/bin/env python3
   ...: 
   ...: import pprint
   ...: from concurrent.futures import ProcessPoolExecutor
   ...: 
   ...: 
   ...: class A:
   ...:     class_var = 2
   ...: 
   ...:     def __init__(self):
   ...:         self.member_var = 3
   ...: 
   ...:     @staticmethod
   ...:     def s_meth(x):
   ...:         return pprint.pformat(locals())
   ...: 
   ...:     @classmethod
   ...:     def c_meth(cls, x):
   ...:         return pprint.pformat(locals())
   ...: 
   ...: 
   ...: def main():
   ...:     with ProcessPoolExecutor(1) as pool:
   ...:         print(pool.submit(A.s_meth, 5).result())
   ...:         print(pool.submit(A.c_meth, 5).result())
   ...: 
   ...: 
   ...: if __name__ == "__main__":
   ...:     main()
   ...: 
   ...: 
{'x': 5}
{'cls': <class '__main__.A'>, 'x': 5}

where cls would carry all of the class members and would need to serialized (in this case pickled) and sent to each forked process. Not a big deal here for these light weight sorting dictionaries but there's no need to pass it unnecessarily. I used __class__ simply to resolve the symbol in a general way; ValenceDict.key_transform would have worked as well.

lilyminium · 2022-01-30T03:51:25Z

Are there 5-body valence terms that we need to support?

Not currently; that's why this is a feature request instead of a bug report. It seems unnecessarily restrictive to me, given that nothing in the current toolkit needs this check. I can implement my own dictionary, but when all functionality is already present in ValenceDict and the only thing stopping me is an AssertionError, it seems like it'd save users and computer time to just remove the assertion. Making a ChainDict and having ValenceDict subclass that with the assertion is definitely one solution.

where cls would carry all of the class members and would need to serialized

That's definitely a valid concern for normal staticmethods -- I was speaking of ones that refer to __class__. I'm much less experienced with processes than you are so correct me if I'm wrong, but surely that necessitates serializing the class as well?

In [2]: # %load static_test.py
   ...: #!/usr/bin/env python3
   ...:
   ...: import pprint
   ...: from concurrent.futures import ProcessPoolExecutor
   ...:
   ...: class A:
   ...:     class_var = 2
   ...:
   ...:     @staticmethod
   ...:     def s_meth():
   ...:         __class__.class_var
   ...:         return pprint.pformat(locals())
   ...:
   ...:     @classmethod
   ...:     def c_meth(cls):
   ...:         cls.class_var
   ...:         return pprint.pformat(locals())
   ...:
   ...: if __name__ == "__main__":
   ...:     with ProcessPoolExecutor(1) as pool:
   ...:         print(pool.submit(A.s_meth).result())
   ...:         print(pool.submit(A.c_meth).result())
   ...:
{'__class__': <class '__main__.A'>}
{'cls': <class '__main__.A'>}

trevorgokey · 2022-01-31T17:30:13Z

I am by no means an expert on the subject, so I wouldn't know without testing it. However I don't think that particular case applies to the code in this issue.

I see a PR has already been crafted to get this change in, so no need for me to get in the way here. Thanks for catching the bug in the ordering!

lilyminium · 2022-01-31T22:00:59Z

Also, it's much more idiomatic python to make this a classmethod instead of a staticmethod that uses __class__.

ValenceDict and ImproperDict both have staticmethods that use __class__, so I felt my example was more representative. Again, I have nothing against staticmethods for the reasons you've mentioned -- they're a useful construct and exist for a reason. I just think that when you require the class in the method, you should switch to a classmethod to communicate the code more clearly.

openff-toolkit/openff/toolkit/topology/topology.py

Line 129 in c32527b

refkey = __class__.key_transform(key)

j-wags · 2022-02-10T23:31:11Z

Gotcha, I see some points on both sides but ultimately I agree with @lilyminium on all proposed changes here.

This is sorta related to a legacy of rough edges around vsites, which came to be because, when @trevorgokey and I were working on it, vsites were supposed to take like a month, and then at the 6 month point, when crazy edge cases kept popping up, and we had unilaterally made changes to the SMIRNOFF spec, and @trevorgokey had rewritten everything several times, we finally said "this is good enough", and this got out with some rough edges. So thanks to both of you for the great discussion, I think @lilyminium's correct-er on every point (except I'm giving up on trying to understand the serialization thing, so I have no opinion there)

I've approved the PR that will close this issue.

lilyminium mentioned this issue Jan 30, 2022

Update ValenceDict #1185

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove length check in ValenceDict #1183

Remove length check in ValenceDict #1183

lilyminium commented Jan 28, 2022

j-wags commented Jan 29, 2022

lilyminium commented Jan 29, 2022 •

edited

Loading

lilyminium commented Jan 29, 2022

trevorgokey commented Jan 30, 2022 •

edited

Loading

lilyminium commented Jan 30, 2022 •

edited

Loading

trevorgokey commented Jan 30, 2022

lilyminium commented Jan 30, 2022 •

edited

Loading

trevorgokey commented Jan 31, 2022

lilyminium commented Jan 31, 2022 •

edited

Loading

j-wags commented Feb 10, 2022

Remove length check in ValenceDict #1183

Remove length check in ValenceDict #1183

Comments

lilyminium commented Jan 28, 2022

j-wags commented Jan 29, 2022

lilyminium commented Jan 29, 2022 • edited Loading

lilyminium commented Jan 29, 2022

trevorgokey commented Jan 30, 2022 • edited Loading

lilyminium commented Jan 30, 2022 • edited Loading

trevorgokey commented Jan 30, 2022

lilyminium commented Jan 30, 2022 • edited Loading

trevorgokey commented Jan 31, 2022

lilyminium commented Jan 31, 2022 • edited Loading

j-wags commented Feb 10, 2022

lilyminium commented Jan 29, 2022 •

edited

Loading

trevorgokey commented Jan 30, 2022 •

edited

Loading

lilyminium commented Jan 30, 2022 •

edited

Loading

lilyminium commented Jan 30, 2022 •

edited

Loading

lilyminium commented Jan 31, 2022 •

edited

Loading