-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Document a Bayesian approach to automated V&V #1382
Conversation
one for each of the values we want to check in the simulation. | ||
In these hypothesis tests, the null hypothesis is that the simulation value matches the V&V target; | ||
In these hypothesis tests, the null hypothesis is that the simulation value comes from our V&V target distribution | ||
and the alternative hypothesis is that it comes from a prior distribution of bugs/errors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is something philosophically interesting here... the alternative hypothesis is that the prior has bugs/errors and they matter. It is possible that there is a bug but it is not caught by this test. But then is it really a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I would say that a bug is still a bug because its something we don't want in the code, even if it doesn't impact the results. For example, if I used a GBD 2018 value instead of 2019 - it's wrong but it might not appear wrong in the outputs. But then yes, we're only testing for bugs that matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abie, that might be (arguably) the alternative hypothesis we want, but here I am describing what alternative hypothesis we are actually testing. With how I have currently done this, there is a distribution of rates if there is no bug (specified by the V&V target) and a distribution of rates if there is a bug (currently this prior is always the same). The latter can have mass around or at the correct values, which represents the situation you are describing -- a bug that is accidentally right. We still include that as part of the alternative hypothesis.
one for each of the values we want to check in the simulation. | ||
In these hypothesis tests, the null hypothesis is that the simulation value matches the V&V target; | ||
In these hypothesis tests, the null hypothesis is that the simulation value comes from our V&V target distribution | ||
and the alternative hypothesis is that it comes from a prior distribution of bugs/errors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I would say that a bug is still a bug because its something we don't want in the code, even if it doesn't impact the results. For example, if I used a GBD 2018 value instead of 2019 - it's wrong but it might not appear wrong in the outputs. But then yes, we're only testing for bugs that matter.
Also moves away from the null/alternative language.
Using Bayesian instead of frequentist hypothesis testing.
Code implementing the statistics and applying this method to domestic migration and immigration is here: ihmeuw/vivarium_census_prl_synth_pop#333