Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

implement to_observe method to Observations #453

Conversation

stevebachmeier
Copy link
Contributor

Implement to_observe method to observations

Description

  • Category: feature
  • JIRA issue: na

Changes and notes

This adds a `to_observe` method to all observations. The default

is True which results in results being gathered at every time event/time
step. You can pass in an Event to the method to help determine whether
it should return False which results in no observation being gathered.

Testing

Added a test; all pass

df = observation.results_gatherer(filtered_pop)
yield df, observation.name, observation.results_updater
if not observation.to_observe(event):
yield None, None, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always confuses me. Why do we yield instead of return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computational efficiency. We could combine these datasets into a mega-list and return it all at once and then loop through as needed later. But generators allow for lazy evaluation so we don't need to store all that at once in memory

@albrja
Copy link
Contributor

albrja commented Jul 23, 2024

"The default is True which results in results being gathered at every time event/time
step" - for clarification, does this mean every lifecycle event or no?

every time step of every lifecycle event. I think we overuse the term "event", but basically, e.g. every time step for an observation registered to a specific lifecycle phase (e.g. "on_collect_metrics")

@@ -109,7 +109,9 @@ def __call__(self, interpolants: pd.DataFrame) -> pd.DataFrame:
)

if self.categorical_parameters:
sub_tables = interpolants.groupby(list(self.categorical_parameters))
sub_tables = interpolants.groupby(
list(self.categorical_parameters), observed=False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just got rid of a bunch of FutureWarnings (observed is changing from default False to default True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the keyword argument observed do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observed: bool, default False
    This only applies if any of the groupers are Categoricals. If True: only show observed values for
    categorical groupers. If False: show all values for categorical groupers.

    Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

@@ -181,13 +182,21 @@ def gather_results(
else:
if stratifications is None:
for observation in observations:
df = observation.results_gatherer(filtered_pop)
yield df, observation.name, observation.results_updater
if not observation.to_observe(event):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my slight preference would be to put more of this in observation if possible, since we are only calling methods from observation. Then one could do e.g.

for observation in observations:
    yield observation.get_this_tuple_of_results_if_to_observe(event, filtered_pop)

(name not serious ofc)
My reasoning here would be that we can avoid deeply nesting conditions and encapsulate things from observation more (e.g. maybe you don't need to_observe in the interface if you do this but need a different method instead. Doesn't matter much either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to defer to @rmudambi on this suggestion

@@ -30,6 +31,12 @@ def _aggregate_state_person_time(x: pd.DataFrame) -> float:
return len(x) * (28 / 365.25)


@pytest.fixture
def mocked_event(mocker) -> Event:
event: Event = mocker.Mock(spec=Event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what made you do this

Copy link
Contributor

@patricktnast patricktnast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@@ -109,7 +109,9 @@ def __call__(self, interpolants: pd.DataFrame) -> pd.DataFrame:
)

if self.categorical_parameters:
sub_tables = interpolants.groupby(list(self.categorical_parameters))
sub_tables = interpolants.groupby(
list(self.categorical_parameters), observed=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the keyword argument observed do?

@@ -155,7 +156,7 @@ def register_observation(
].append(observation)

def gather_results(
self, population: pd.DataFrame, event_name: str
self, population: pd.DataFrame, event_name: str, event: Event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but shouldn't event_name be an attribute of Event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I just chose a poor name for "event" name. Really, it's lifecycle phase (e.g. "collect_metrics")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the name of this variable to "lifecycle_phase"

@@ -202,6 +203,8 @@ def register_stratified_observation(
A list of population view columns to be used in the aggregator.
aggregator
A function that computes the quantity for the observation.
to_observe
A function that determines whether to observe the event at the current time step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a little confusing, due to the overloading of the word "event". Maybe better would be:

A function that determines whether to perform an observation on this Event.

@@ -287,6 +292,8 @@ def register_unstratified_observation(
A list of population view columns to be used in the aggregator.
aggregator
A function that computes the quantity for the observation.
to_observe
A function that determines whether to observe the event at the current time step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment.

@@ -403,6 +416,8 @@ def register_concatenating_observation(
A list of the value pipelines that are required by either the pop_filter or the aggregator.
results_formatter
A function that formats the observation results.
to_observe
A function that determines whether to observe the event at the current time step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

@@ -35,6 +37,21 @@ class BaseObservation(ABC):
results_updater: Callable[[pd.DataFrame, pd.DataFrame], pd.DataFrame]
results_formatter: Callable[[str, pd.DataFrame], pd.DataFrame]
stratifications: Optional[Tuple[str]]
to_observe: Callable[[Event], bool]

def gather_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having methods called gather_results and results_gatherer is very confusing. We need better names for these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - renamed to this to simply "observe", i.e.

for observation in observations:
    yield observation.observe(event, pop, stratificatons) ...
    ...

@stevebachmeier stevebachmeier merged commit 9222860 into release-candidate-spring Jul 25, 2024
9 checks passed
@stevebachmeier stevebachmeier deleted the feature/sbachmei/add-to_observe-method-to-observer branch July 25, 2024 19:08
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants