Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pull non loglinear rrs #352

Merged

Conversation

hussain-jafari
Copy link
Contributor

pull non loglinear RRs

Description

Changes and notes

Convert exposure column to parameter column and keep exposures as NaNs.
Update allowed upper limit of RR values.
Add RR extraction tests back in to test suite.

Testing

Ran test suites (except for years='all' which was run in a notebook). Ran pulled data through simulation validations.

"Relative risk data in new format with 1000 exposure values. Our processing is not "
"currently able to process data in this format."
)
# TODO: new validations?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove todo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reminder that we should discuss any possible new validations that we might want to perform as a result of this new data structure.

"currently able to process data in this format."
)
# TODO: new validations?
if not data["exposure"].isna().any() and data["parameter"].isna().all():
Copy link
Contributor

@albrja albrja Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find the not with and boolean logic confusing sometimes so you could do notna().all() instead of not isna but it doesnt matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do you need to worry about if some of parameter is empty but not all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree w/ Jim that if not data["exposure"].notna().all() and data["parameter"].isna().all(): is clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't know about notna I like that more as well. And Jim - I'm not sure since I'm not that familiar with this type of RR data from GBD but you're right in thinking that it's more of a check that there's anything at all in parameter column.

# TODO: new validations?
if not data["exposure"].isna().any() and data["parameter"].isna().all():
data["parameter"] = data["exposure"]
data["exposure"] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a big change that applies to things not specific to pulling non-loglinear rrs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non log linear RRs are the only type of data where we have data in the exposure column that we've seen so this should only apply to those sort of data.

@albrja albrja self-requested a review July 11, 2024 19:25
@@ -194,6 +193,9 @@ def test_core_risklike(entity, measure, location):
def test_year_id_risklike(entity, measure, location, years):
entity_name, entity_expected_measure_ids = entity
measure_name, measure_id = measure
# exposure-parametrized RRs for all years requires a lot of time and memory to process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But don't we still need to test it? Could we mark that specific combination of parameters as slow and have it run automatically overnight or something like we do elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would definitely prefer to do that. Do you have an example of where we do that - I believe psuedopeople?

@aflaxman
Copy link
Member

This works! But it generates a gazillion warnings for me; when I run it with this command,

import vivarium_inputs as vii, gbd_mapping
df = vii.get_measure(gbd_mapping.risk_factors.diet_high_in_sodium, 'relative_risk', 'Washington')

I get a lot of stuff like the following

2024-07-16 08:52:26.812 | WARNING  | vivarium_inputs.validation.raw:check_age_restrictions:2239 - Data was expected to contain all age groups between ids 8 and 235 but was missing the following: {8, 9}.

and measure[0] == "relative_risk"
and years == "all"
):
pytest.skip(reason="need --runslow option to run")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I misunderstood your slack message. I thought you were trying to mark as slow here rather than directly mark as skip.

@stevebachmeier stevebachmeier self-requested a review July 16, 2024 17:18
@hussain-jafari hussain-jafari merged commit 52a15ce into epic/non_loglinear_rrs Jul 16, 2024
6 checks passed
@hussain-jafari hussain-jafari deleted the feature/MIC-5129_pull_non_loglinear_rrs branch July 16, 2024 18:35
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants