-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Clean up identify_variables
/ identify_mutable_parameters
; deprecate SimpleExpressionVisitor
#3436
base: main
Are you sure you want to change the base?
Conversation
identify_variables
/ identify_mutable_parameters
; deprecate SImpleExpressionVisitor
identify_variables
/ identify_mutable_parameters
; deprecate SimpleExpressionVisitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiniest of changes
Co-authored-by: Miranda Mundt <55767766+mrmundt@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance comparison on my motivating example for the previous identify_variables
rewrite:
Main
Identifier ncalls cumtime percall %
----------------------------------------------------------------------------------
root 1 2.905 2.905 100.0
-----------------------------------------------------------------------------
full model post-solve 1 0.484 0.484 16.7
solve-scc 1 2.421 2.421 83.3
--------------------------------------------------------
igraph 1 0.359 0.359 14.8
scc-subsolver 546 1.610 0.003 66.5
vars-from-components 546 0.187 0.000 7.7
other n/a 0.264 n/a 10.9
========================================================
other n/a 0.000 n/a 0.0
=============================================================================
==================================================================================
This branch
Identifier ncalls cumtime percall %
----------------------------------------------------------------------------------
root 1 2.960 2.960 100.0
-----------------------------------------------------------------------------
full model post-solve 1 0.482 0.482 16.3
solve-scc 1 2.478 2.478 83.7
--------------------------------------------------------
igraph 1 0.358 0.358 14.4
scc-subsolver 546 1.536 0.003 62.0
vars-from-components 546 0.326 0.001 13.1
other n/a 0.259 n/a 10.4
========================================================
other n/a 0.000 n/a 0.0
=============================================================================
==================================================================================
Less than a factor of 2 overhead (see the vars-from-components
category) and still much better than the 2s this was taking before exploiting named expressions. Thanks for fixing this.
# The following attributes will be added by initializeWalker: | ||
# self._objs: the list of found objects | ||
# self._seen: set(self._objs) | ||
# self._exprs: list of (e, e.expr) for any (nested) named expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we need to store e.expr
in addition to e
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we need to store the immutable expression object in case e
changes what expression it contains.
if self._cache is None: | ||
return True, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check my understanding: This means that if named_expression_cache
was not provided in __init__
, we won't exploit repeated named expressions within this expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. My assumption is that the same named expression rarely appears twice in a single expression. So, if you don't provide a cache, then there isn't a big win for defining a cache -- there probably won't be a cache hit (and there is the overhead of creating the cache and throwing it away).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably a good assumption in general, but I know that for some IDAES models (at least the autothermal reformer), there is a significant benefit to exploiting named expressions within a single constraint. (Of course the user can always do this by explicitly passing the named expression cache.)
v = identify_variables.visitor | ||
save = v._include_fixed, v._cache | ||
try: | ||
v._include_fixed = include_fixed | ||
v._cache = named_expression_cache | ||
yield from v.walk_expression(expr) | ||
finally: | ||
v._include_fixed, v._cache = save | ||
|
||
|
||
identify_variables.visitor = IdentifyVariableVisitor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a global visitor instead of a new one every time identify_variables
is called? Is there a significant overhead to __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of overhead in object creation and disposal. In other cases, I have seen performance benefits by keeping the visitor around between calls. I did not profile the impact here, so maybe this is a red herring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen this in AMPLRepnVisitor
, so I'm not too surprised, but it hasn't shown up in any of my profiles.
@Robbybp: I am surprised that the overhead went up. can you share your test with me (off-line is fine)? I wonder if we are measuring something else (like the GC)? |
@jsiirola Run this script: https://github.com/Robbybp/surrogate-vs-implicit/blob/main/svi/auto_thermal_reformer/fullspace_flowsheet.py I insert the timing calls into |
We are waiting to merge this until we double check the performance degradation @Robbybp noted |
FWIW, the evidence I presented for a performance degradation is pretty flimsy (~0.15s diff), so I wouldn't be opposed to just merging this as-is, given the bug fixes. |
@Robbybp I finally got a chance to look at this, and there was an issue with the implementation. The original approach to validating the named expression cache involved storing a list of all expressions seen in the current expression or any sub-expression. Because of how interconnected named expressions are in IDAES models, this would occasionally get to be a very long list (>1k!). The implementation has been updated to store this as a dict, which prunes redundant instances of the same subexpression (so the list stays under 20 long). This PR's implementation is now somewhere between equivalent to the old implementation to possibly slightly faster (the noise in timing in Python is now more than the difference between the two implementations) for your case. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3436 +/- ##
==========================================
+ Coverage 88.61% 88.62% +0.01%
==========================================
Files 880 880
Lines 100639 100646 +7
==========================================
+ Hits 89181 89198 +17
+ Misses 11458 11448 -10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent commits look good to me, thanks for looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a minor issue in the docs that needs to be fixed but otherwise this looks fine
Fixes # .
Summary/Motivation:
This cleans up the implementation of
identify_variables
to simplify the implementation, improve efficiency, and improve robustness of the named expression cache:In addition:
identify_mutables_parameters
is moved to build onidentify_variables
(by deriving from the IdentifyVariablesVisitor`)identify_components
to use theStreamBasedExpressionVisitor
This allows us to (finally) deprecate the
SimpleExpressionVisitor
and remove it from the documentationChanges proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: