-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
implement to_observe method to Observations #453
implement to_observe method to Observations #453
Conversation
df = observation.results_gatherer(filtered_pop) | ||
yield df, observation.name, observation.results_updater | ||
if not observation.to_observe(event): | ||
yield None, None, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always confuses me. Why do we yield instead of return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computational efficiency. We could combine these datasets into a mega-list and return it all at once and then loop through as needed later. But generators allow for lazy evaluation so we don't need to store all that at once in memory
"The default is True which results in results being gathered at every time event/time every time step of every lifecycle event. I think we overuse the term "event", but basically, e.g. every time step for an observation registered to a specific lifecycle phase (e.g. "on_collect_metrics") |
@@ -109,7 +109,9 @@ def __call__(self, interpolants: pd.DataFrame) -> pd.DataFrame: | |||
) | |||
|
|||
if self.categorical_parameters: | |||
sub_tables = interpolants.groupby(list(self.categorical_parameters)) | |||
sub_tables = interpolants.groupby( | |||
list(self.categorical_parameters), observed=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just got rid of a bunch of FutureWarnings (observed
is changing from default False to default True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the keyword argument observed
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observed: bool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for
categorical groupers. If False: show all values for categorical groupers.
Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.
@@ -181,13 +182,21 @@ def gather_results( | |||
else: | |||
if stratifications is None: | |||
for observation in observations: | |||
df = observation.results_gatherer(filtered_pop) | |||
yield df, observation.name, observation.results_updater | |||
if not observation.to_observe(event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my slight preference would be to put more of this in observation
if possible, since we are only calling methods from observation
. Then one could do e.g.
for observation in observations:
yield observation.get_this_tuple_of_results_if_to_observe(event, filtered_pop)
(name not serious ofc)
My reasoning here would be that we can avoid deeply nesting conditions and encapsulate things from observation more (e.g. maybe you don't need to_observe in the interface if you do this but need a different method instead. Doesn't matter much either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to defer to @rmudambi on this suggestion
@@ -30,6 +31,12 @@ def _aggregate_state_person_time(x: pd.DataFrame) -> float: | |||
return len(x) * (28 / 365.25) | |||
|
|||
|
|||
@pytest.fixture | |||
def mocked_event(mocker) -> Event: | |||
event: Event = mocker.Mock(spec=Event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what made you do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
@@ -109,7 +109,9 @@ def __call__(self, interpolants: pd.DataFrame) -> pd.DataFrame: | |||
) | |||
|
|||
if self.categorical_parameters: | |||
sub_tables = interpolants.groupby(list(self.categorical_parameters)) | |||
sub_tables = interpolants.groupby( | |||
list(self.categorical_parameters), observed=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the keyword argument observed
do?
@@ -155,7 +156,7 @@ def register_observation( | |||
].append(observation) | |||
|
|||
def gather_results( | |||
self, population: pd.DataFrame, event_name: str | |||
self, population: pd.DataFrame, event_name: str, event: Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope, but shouldn't event_name
be an attribute of Event
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I just chose a poor name for "event" name. Really, it's lifecycle phase (e.g. "collect_metrics")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the name of this variable to "lifecycle_phase"
@@ -202,6 +203,8 @@ def register_stratified_observation( | |||
A list of population view columns to be used in the aggregator. | |||
aggregator | |||
A function that computes the quantity for the observation. | |||
to_observe | |||
A function that determines whether to observe the event at the current time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little confusing, due to the overloading of the word "event". Maybe better would be:
A function that determines whether to perform an observation on this Event.
@@ -287,6 +292,8 @@ def register_unstratified_observation( | |||
A list of population view columns to be used in the aggregator. | |||
aggregator | |||
A function that computes the quantity for the observation. | |||
to_observe | |||
A function that determines whether to observe the event at the current time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See earlier comment.
@@ -403,6 +416,8 @@ def register_concatenating_observation( | |||
A list of the value pipelines that are required by either the pop_filter or the aggregator. | |||
results_formatter | |||
A function that formats the observation results. | |||
to_observe | |||
A function that determines whether to observe the event at the current time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
@@ -35,6 +37,21 @@ class BaseObservation(ABC): | |||
results_updater: Callable[[pd.DataFrame, pd.DataFrame], pd.DataFrame] | |||
results_formatter: Callable[[str, pd.DataFrame], pd.DataFrame] | |||
stratifications: Optional[Tuple[str]] | |||
to_observe: Callable[[Event], bool] | |||
|
|||
def gather_results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having methods called gather_results
and results_gatherer
is very confusing. We need better names for these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - renamed to this to simply "observe", i.e.
for observation in observations:
yield observation.observe(event, pop, stratificatons) ...
...
Implement to_observe method to observations
Description
Changes and notes
This adds a `to_observe` method to all observations. The defaultis True which results in results being gathered at every time event/time
step. You can pass in an Event to the method to help determine whether
it should return False which results in no observation being gathered.
Testing
Added a test; all pass