Feature/sbachmei/mic 5163 exclude unwanted results #460

stevebachmeier · 2024-08-07T21:16:19Z

Implement excluded results via model spec

Description

Category: feature
JIRA issue: https://jira.ihme.washington.edu/browse/MIC-5163

Changes and notes

This implements the ability to exclude results via the model spec

with a new stratification: excluded_categories key.

The work got a bit bigger than expected b/c we had to handle (1) typical
results that are actual Stratification objects (2) non-typical results that
are not Stratification objects but get handled on a case-by-case bases
(e.g. ylds which you cannot stratify by cause b/c you can have multiple per
person), and (3) results that have an extra unique category that are
not explicitly requested but come "for free" (e.g. all_causes for deaths
and other_causes for disability).

At a high level, what we're doing is modifying what is happening during
ResultsContext.gather_results():

We extract any excluded categories for a given result measure and
drop them from the column mapping during each Stratification.call
Then, for each (pop_filter, stratfiications) tuple that gets registered
from observations, we _filter_population to both query the defined
pop_filter and also remove any rows that have NaNs in
columns we are stratifying by.

Testing

tests pass and I ran some small simulations against the maternal
and child models.

albrja · 2024-08-07T22:29:54Z

src/vivarium/framework/results/context.py

@@ -57,6 +61,7 @@ def add_stratification(
        name: str,
        sources: List[str],
        categories: List[str],
+        excluded_categories: Optional[List[str]],


Why is this a list vs a dict now?

Not sure what you mean. This is a new thing and was never a dict

Oh you're referring to the zoom discussion about the broken doctest?

At the highest / model spec level, excluded_categories is a dict because you need to specify which exclusions for which set of results.

# model spec ... configuration: stratification: excluded_categories: cause_of_death: - 'stillborn' disability: - 'all_causes' - 'mild_child_wating'

But this attribute is attached to a very specific stratification (e.g. 'cause_of_death') and so is just the list to exclude.

albrja · 2024-08-07T22:32:57Z

src/vivarium/framework/results/interface.py

@@ -73,6 +73,7 @@ def register_stratification(
        self,
        name: str,
        categories: List[str],
+        excluded_categories: Optional[List[str]] = None,


Per comment above, do we just convert the dict from the config into a list so now these will always be a list?

This is at the stratfiication-level and so is always a list. The model spec also provides a list at the stratification level, it's just that each stratification is stored as the keys to a dictionary.

stevebachmeier · 2024-08-07T22:43:19Z

docs/source/tutorials/exploration.rst

@@ -95,6 +95,7 @@ configuration by simply printing it.
   sim = get_disease_model_simulation()

   del sim.configuration['input_data']
+   del sim.configuration['stratification']['excluded_categories']


This key was throwing errors in the expected print(sim.configuration) test below. The actual type of sim.configuration.stratification.excluded_categories is an emtpy LayeredConfigTree which I just couldn't get to work (I tried None, an empty string, literally nothing, and LayeredConfigTree() and none of them worked).

I actually was able to get this test to pass if I set the default to None (instead of an empty dict which then gets converted to a LayeredConfigTree) but that broke a bunch of tests in vph when trying to .update the value b/c you cannot update a ConfigNode.

rmudambi · 2024-08-07T23:02:00Z

src/vivarium/framework/results/interface.py

@@ -86,6 +87,9 @@ def register_stratification(
            Name of the of the column created by the stratification.
        categories
            List of string values that the mapper is allowed to output.
+        excluded_categories
+            List of mapped string values to be excluded from results processing.
+            If `None` (the default), will use exclusions as defined in th model spec.


Typo:

Suggested change

If `None` (the default), will use exclusions as defined in th model spec.

If `None` (the default), will use exclusions as defined in the configuration.

rmudambi · 2024-08-07T23:03:38Z

src/vivarium/framework/results/interface.py

@@ -145,6 +150,7 @@ def register_binned_stratification(
        ------
        None
        """
+        # TODO: implement excluded_categories like in `register_stratification`


Will this be dealt with in a subsequent PR?

@rmudambi I wasn't sure how much of a priority this even is. It wasn't immediately clear to me how to handle excluded categories but then I'm also aware that we don't actually have an example of using register_binned_stratification.

I added this to the tests

rmudambi · 2024-08-07T23:07:48Z

src/vivarium/framework/results/manager.py

@@ -171,6 +177,9 @@ def register_stratification(
            Name of the of the column created by the stratification.
        categories
            List of string values that the mapper is allowed to output.
+        excluded_categories
+            List of mapped string values to be excluded from results processing.
+            If `None` (the default), will use exclusions as defined in th model spec.


Suggested change

If `None` (the default), will use exclusions as defined in th model spec.

If `None` (the default), will use exclusions as defined in the configuration

rmudambi · 2024-08-07T23:08:47Z

src/vivarium/framework/results/context.py

@@ -71,6 +76,9 @@ def add_stratification(
            categorization.
        categories
            List of string values that the mapper is allowed to output.
+        excluded_categories
+            List of mapped string values to be excluded from results processing.
+            If `None` (the default), will use exclusions as defined in th model spec.


Suggested change

If `None` (the default), will use exclusions as defined in th model spec.

If `None` (the default), will use exclusions as defined in the configuration.

rmudambi · 2024-08-07T23:12:48Z

src/vivarium/framework/results/manager.py

@@ -71,6 +72,8 @@ def get_results(self) -> Dict[str, pd.DataFrame]:

    # noinspection PyAttributeOutsideInit
    def setup(self, builder: "Builder") -> None:
+        self._results_context.setup(builder)


Was this previously not being called at all?

It wasn't, though there previously wasn't anything interesting there (only the logger). But now there's the excluded categories

rmudambi · 2024-08-07T23:15:30Z

src/vivarium/framework/results/manager.py

@@ -138,6 +141,8 @@ def gather_results(self, lifecycle_phase: str, event: Event) -> None:
        with 0.0.
        """
        population = self._prepare_population(event)
+        if population.empty:
+            return


Why is this necessary?

I guess it's not strictly necessary but seemed silly to pass an empty population through gather_results

rmudambi · 2024-08-07T23:19:58Z

src/vivarium/framework/results/stratification.py

-            CategoricalDtype(categories=self.categories, ordered=True)
+            mapped_column = population[self.sources].apply(self.mapper, axis=1)
+
+        if mapped_column.isnull().any():


Why is this check needed? Shouldn't it be sufficient to do the check below?

I was trying to handle the fact that NaNs are not equal and so the set of NaNs would blow up the error message. I'll try and think of a way to handle that though b/c it would be nice to check against other unknown categories as well before raising.

rmudambi · 2024-08-08T18:30:43Z

src/vivarium/framework/components/manager.py

@@ -348,7 +348,7 @@ def get_component(self, name: str) -> Component:
        return self._manager.get_component(name)

    def get_components_by_type(
-        self, component_type: Union[type, Tuple[type, ...]]
+        self, component_type: Union[type, Tuple[type, ...], list[type]]


Why not make this Sequence[type]?

rmudambi · 2024-08-08T18:33:02Z

src/vivarium/framework/components/manager.py

@@ -364,7 +364,7 @@ def get_components_by_type(
            A list of components of type ``component_type``.

        """
-        return self._manager.get_components_by_type(component_type)
+        return self._manager.get_components_by_type(tuple(component_type))


Update the typing in the manager rather than converting it to a tuple here.

It does actually have to be a tuple though for isinstance. I do think it's cleaner to convert it to a tuple at the very end so I've done that

rmudambi · 2024-08-08T19:18:36Z

src/vivarium/framework/results/context.py

@@ -61,7 +61,7 @@ def add_stratification(
        name: str,
        sources: List[str],
        categories: List[str],
-        excluded_categories: Optional[List[str]],
+        excluded_categories: List[str],


Why did you change this from None to an empty list?

It just seemed more consistent w/ the other list arguments defauling to [] instead of None.

Put it back to None b/c it conveys different information than an empty list.

rmudambi · 2024-08-08T19:29:41Z

src/vivarium/framework/results/stratification.py

+            # Reduce all nans to a single one
+            single_nan_list = (
+                [mapped_column[mapped_column.isna()].iat[0]]
+                if mapped_column.isna().any()
+                else []
+            )
+            unknown_categories = single_nan_list + [
+                cat for cat in unknown_categories if not pd.isna(cat)
+            ]


Why can't this be:

unknown_categories = [ cat for cat in unknown_categories if not pd.isna(cat) ] if mapped_column.isna().any(): unknown_categories.append(np.nan)

I feel like pd.isna wasn't working as expected but now I'm not so sure. I'll try again

Ok, yeah that works and is way better. I did stick w/ appending the type of NaN since that could potentially shed light on the issue but same basic logic

stevebachmeier · 2024-08-12T17:48:59Z

src/vivarium/framework/results/context.py


+        # Optimization: We store all the producers by pop_filter and stratifications
+        # so that we only have to apply them once each time we compute results.
        for (pop_filter, stratifications), observations in self.observations[


rename to stratififaton_names or *_columns or something

rmudambi · 2024-08-12T18:38:22Z

src/vivarium/framework/results/context.py

+        self,
+        population: pd.DataFrame,
+        pop_filter: str,
+        stratifications: Optional[tuple[str, ...]],


Nit: rename this to stratification_names as well.

rmudambi · 2024-08-12T18:41:31Z

src/vivarium/framework/results/context.py

        if stratifications:
-            pop = pop.dropna(subset=list(stratifications))
+            # Drop all rows in the mapped_stratification columns that have NaN values


Nit: maybe explain that we are dropping these rows because being NaN means that the value was previously one of the excluded categories.

rmudambi · 2024-08-12T18:42:32Z

src/vivarium/framework/results/context.py

    def _get_groups(
-        stratifications: Tuple[str, ...], filtered_pop: pd.DataFrame
+        self, stratifications: Tuple[str, ...], filtered_pop: pd.DataFrame


Nit: this can still be static? I don't see self being used.

rmudambi · 2024-08-12T18:51:44Z

tests/framework/results/test_interface.py

+
+def test_register_stratification(mocker):
+    def _silly_mapper():
+        return "foo"


Shouldn't the mapper return values in {"some-category", "some-other-category", "some-unwanted-category"}? I know we're not testing that in this test, but it is a bit confusing.

src/vivarium/framework/results/context.py

patricktnast · 2024-08-12T19:44:24Z

src/vivarium/framework/results/stratification.py

-    `categories` is a set of values that the mapper is allowed to output. The
-    `mapper` is the method that transforms the source to the name column.
-    The method produces an output column by calling the mapper on the source
+    The `Stratification` class has six fields: `name`, `sources`, `mapper`,


I guess this PR only adds to what existed already, but if we're going to go through each attribute one by one, we should probably use a systematic docstring format i.e. what we use for say, Builder
https://github.com/ihmeuw/vivarium/blob/main/src/vivarium/framework/engine.py
I'd say "X class has # fields" is also unnecessary.

Agree, and I'll do all that doc stuff in the final PRs for this epic. For now I only modified docs if they were categorically incorrect

…ault to []

…egories at debug level

…_binned_stratification

Co-authored-by: patricktnast <130876799+patricktnast@users.noreply.github.com>

* implement exclusions * handle name collisions when stratifying * allow component_type to be any sequence * Add tests for stratification registration through interface

stevebachmeier requested review from albrja, collijk, hussain-jafari, patricktnast and rmudambi as code owners August 7, 2024 21:16

stevebachmeier force-pushed the feature/sbachmei/MIC-5163-exclude-unwanted-results branch from aa74ad9 to a211a84 Compare August 7, 2024 22:15

albrja reviewed Aug 7, 2024

View reviewed changes

albrja approved these changes Aug 7, 2024

View reviewed changes

stevebachmeier commented Aug 7, 2024

View reviewed changes

rmudambi approved these changes Aug 7, 2024

View reviewed changes

rmudambi reviewed Aug 8, 2024

View reviewed changes

stevebachmeier force-pushed the feature/sbachmei/MIC-5163-exclude-unwanted-results branch from b910ae9 to c394aa7 Compare August 8, 2024 23:24

stevebachmeier requested review from rmudambi and albrja August 9, 2024 17:37

stevebachmeier commented Aug 12, 2024

View reviewed changes

rmudambi approved these changes Aug 12, 2024

View reviewed changes

patricktnast approved these changes Aug 12, 2024

View reviewed changes

stevebachmeier added 12 commits August 12, 2024 13:06

implement for Stratification objects

c4b90d6

clean up Stratification.__call__ logic

dbd2ed7

implement excluded categories at stratification registration; fix tests

7abc8ee

Add to testing; fix broken tests

f3983ae

remove excluded_categories from doctest configuration

2489351

typos

86c0d0e

better handle logging when mapped to nan

9c2be2d

add excluded_categories to register_binned_stratification; change def…

312a10c

…ault to []

allow for lists passed into 'get_components_by_type'; log dropped cat…

39d10af

…egories at debug level

handle name collisions when stratifying

2815e9c

better handle when to convert back to original stratification col names

215d1d7

encapsulate mapped col name functions

814aa3d

stevebachmeier and others added 8 commits August 12, 2024 13:06

minor typing change; move when convert component_type to tuple

40f3246

handle NaNs in excluded categories better

b7aa6ec

update docstrings and typing for excluded_categories args in register…

4a61d25

…_binned_stratification

Add tests stratification registration through interface (#461)

d88683b

minor name changes

1fa8c49

change default excluded_categories back to None

84e676c

review nits

b6ce743

comment typo

8c71ae0

Co-authored-by: patricktnast <130876799+patricktnast@users.noreply.github.com>

stevebachmeier force-pushed the feature/sbachmei/MIC-5163-exclude-unwanted-results branch from fa615d2 to 8c71ae0 Compare August 12, 2024 20:06

stevebachmeier merged commit 2dfe6c8 into release-candidate-spring Aug 12, 2024
6 checks passed

stevebachmeier deleted the feature/sbachmei/MIC-5163-exclude-unwanted-results branch August 12, 2024 20:10

	If `None` (the default), will use exclusions as defined in th model spec.
	If `None` (the default), will use exclusions as defined in the configuration.

Feature/sbachmei/mic 5163 exclude unwanted results #460

Feature/sbachmei/mic 5163 exclude unwanted results #460

Conversation

stevebachmeier commented Aug 7, 2024

Implement excluded results via model spec

Description

Changes and notes

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment