pull non loglinear rrs #352

hussain-jafari · 2024-07-10T23:17:07Z

pull non loglinear RRs

Description

Category: feature
JIRA issue: https://jira.ihme.washington.edu/browse/MIC-5129

Changes and notes

Convert exposure column to parameter column and keep exposures as NaNs.
Update allowed upper limit of RR values.
Add RR extraction tests back in to test suite.

Testing

Ran test suites (except for years='all' which was run in a notebook). Ran pulled data through simulation validations.

albrja · 2024-07-11T01:21:58Z

src/vivarium_inputs/extract.py

-            "Relative risk data in new format with 1000 exposure values. Our processing is not "
-            "currently able to process data in this format."
-        )
+    # TODO: new validations?


Remove todo

This is a reminder that we should discuss any possible new validations that we might want to perform as a result of this new data structure.

albrja · 2024-07-11T01:23:19Z

src/vivarium_inputs/extract.py

-            "currently able to process data in this format."
-        )
+    # TODO: new validations?
+    if not data["exposure"].isna().any() and data["parameter"].isna().all():


Nit: I find the not with and boolean logic confusing sometimes so you could do notna().all() instead of not isna but it doesnt matter.

Also do you need to worry about if some of parameter is empty but not all?

I agree w/ Jim that if not data["exposure"].notna().all() and data["parameter"].isna().all(): is clearer

Ah I didn't know about notna I like that more as well. And Jim - I'm not sure since I'm not that familiar with this type of RR data from GBD but you're right in thinking that it's more of a check that there's anything at all in parameter column.

stevebachmeier · 2024-07-11T16:46:47Z

src/vivarium_inputs/extract.py

+    # TODO: new validations?
+    if not data["exposure"].isna().any() and data["parameter"].isna().all():
+        data["parameter"] = data["exposure"]
+        data["exposure"] = np.nan


This seems like a big change that applies to things not specific to pulling non-loglinear rrs?

Non log linear RRs are the only type of data where we have data in the exposure column that we've seen so this should only apply to those sort of data.

stevebachmeier · 2024-07-11T20:25:22Z

tests/extract/test_core.py

@@ -194,6 +193,9 @@ def test_core_risklike(entity, measure, location):
 def test_year_id_risklike(entity, measure, location, years):
    entity_name, entity_expected_measure_ids = entity
    measure_name, measure_id = measure
+    # exposure-parametrized RRs for all years requires a lot of time and memory to process


But don't we still need to test it? Could we mark that specific combination of parameters as slow and have it run automatically overnight or something like we do elsewhere?

Yeah I would definitely prefer to do that. Do you have an example of where we do that - I believe psuedopeople?

…ihmeuw/vivarium_inputs into feature/MIC-5129_pull_non_loglinear_rrs

aflaxman · 2024-07-16T15:58:33Z

This works! But it generates a gazillion warnings for me; when I run it with this command,

import vivarium_inputs as vii, gbd_mapping
df = vii.get_measure(gbd_mapping.risk_factors.diet_high_in_sodium, 'relative_risk', 'Washington')

I get a lot of stuff like the following

2024-07-16 08:52:26.812 | WARNING  | vivarium_inputs.validation.raw:check_age_restrictions:2239 - Data was expected to contain all age groups between ids 8 and 235 but was missing the following: {8, 9}.

stevebachmeier · 2024-07-16T17:17:41Z

tests/extract/test_core.py

+        and measure[0] == "relative_risk"
+        and years == "all"
+    ):
+        pytest.skip(reason="need --runslow option to run")


Ok, I misunderstood your slack message. I thought you were trying to mark as slow here rather than directly mark as skip.

Hussain Jafari added 3 commits July 8, 2024 13:42

process exposure-parametrized RRs

0c16d3d

intermediate test commit

a4abe26

change upper RR limit and add all tests

aa4fce0

hussain-jafari requested review from albrja, collijk, patricktnast, rmudambi and stevebachmeier as code owners July 10, 2024 23:17

albrja reviewed Jul 11, 2024

View reviewed changes

stevebachmeier reviewed Jul 11, 2024

View reviewed changes

albrja self-requested a review July 11, 2024 19:25

albrja approved these changes Jul 11, 2024

View reviewed changes

don't test sbp RRs for all years

f66bc13

stevebachmeier reviewed Jul 11, 2024

View reviewed changes

lint

2e315e9

github-actions bot mentioned this pull request Jul 12, 2024

Post-Sprint Open Pull Request Metrics Report (dev) ihmeuw/vivarium_testing_utils#5

Open

Hussain Jafari and others added 8 commits July 15, 2024 13:22

intermediate

c84517b

Merge branch 'feature/MIC-5129_pull_non_loglinear_rrs' of github.com:…

36c8ca7

…ihmeuw/vivarium_inputs into feature/MIC-5129_pull_non_loglinear_rrs

skip long test

f04f696

separate out test

c711117

lint

fcbd12d

lint everything

4f60a0b

update comment

f6071c4

lint comment

9f5b524

stevebachmeier reviewed Jul 16, 2024

View reviewed changes

stevebachmeier self-requested a review July 16, 2024 17:18

stevebachmeier approved these changes Jul 16, 2024

View reviewed changes

hussain-jafari merged commit 52a15ce into epic/non_loglinear_rrs Jul 16, 2024
6 checks passed

hussain-jafari deleted the feature/MIC-5129_pull_non_loglinear_rrs branch July 16, 2024 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pull non loglinear rrs #352

pull non loglinear rrs #352

hussain-jafari commented Jul 10, 2024

albrja Jul 11, 2024

hussain-jafari Jul 11, 2024

albrja Jul 11, 2024 •

edited

Loading

albrja Jul 11, 2024

stevebachmeier Jul 11, 2024

hussain-jafari Jul 11, 2024

stevebachmeier Jul 11, 2024

hussain-jafari Jul 11, 2024

stevebachmeier Jul 11, 2024

hussain-jafari Jul 12, 2024

aflaxman commented Jul 16, 2024

stevebachmeier Jul 16, 2024

pull non loglinear rrs #352

pull non loglinear rrs #352

Conversation

hussain-jafari commented Jul 10, 2024