-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
pull non loglinear rrs #352
pull non loglinear rrs #352
Conversation
"Relative risk data in new format with 1000 exposure values. Our processing is not " | ||
"currently able to process data in this format." | ||
) | ||
# TODO: new validations? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reminder that we should discuss any possible new validations that we might want to perform as a result of this new data structure.
"currently able to process data in this format." | ||
) | ||
# TODO: new validations? | ||
if not data["exposure"].isna().any() and data["parameter"].isna().all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I find the not with and boolean logic confusing sometimes so you could do notna().all() instead of not isna but it doesnt matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do you need to worry about if some of parameter is empty but not all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree w/ Jim that if not data["exposure"].notna().all() and data["parameter"].isna().all():
is clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't know about notna I like that more as well. And Jim - I'm not sure since I'm not that familiar with this type of RR data from GBD but you're right in thinking that it's more of a check that there's anything at all in parameter column.
# TODO: new validations? | ||
if not data["exposure"].isna().any() and data["parameter"].isna().all(): | ||
data["parameter"] = data["exposure"] | ||
data["exposure"] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a big change that applies to things not specific to pulling non-loglinear rrs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non log linear RRs are the only type of data where we have data in the exposure column that we've seen so this should only apply to those sort of data.
@@ -194,6 +193,9 @@ def test_core_risklike(entity, measure, location): | |||
def test_year_id_risklike(entity, measure, location, years): | |||
entity_name, entity_expected_measure_ids = entity | |||
measure_name, measure_id = measure | |||
# exposure-parametrized RRs for all years requires a lot of time and memory to process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But don't we still need to test it? Could we mark that specific combination of parameters as slow and have it run automatically overnight or something like we do elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would definitely prefer to do that. Do you have an example of where we do that - I believe psuedopeople?
…ihmeuw/vivarium_inputs into feature/MIC-5129_pull_non_loglinear_rrs
This works! But it generates a gazillion warnings for me; when I run it with this command,
I get a lot of stuff like the following
|
and measure[0] == "relative_risk" | ||
and years == "all" | ||
): | ||
pytest.skip(reason="need --runslow option to run") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I misunderstood your slack message. I thought you were trying to mark as slow here rather than directly mark as skip.
pull non loglinear RRs
Description
Changes and notes
Convert exposure column to parameter column and keep exposures as NaNs.
Update allowed upper limit of RR values.
Add RR extraction tests back in to test suite.
Testing
Ran test suites (except for years='all' which was run in a notebook). Ran pulled data through simulation validations.