-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
adjusted for detection rate #7
base: master
Are you sure you want to change the base?
Conversation
Actual cases are 10-20x the number of confirmed so I added an adjustment using a beta distribution for the detection rate and results from some recent reports on the degree of underreporting.
Sorry diffs on notebooks are a pain, can you outline your approach for adjusting? You say you use a beta distribution but you also have a csv of fixed values? Very interested in this work... |
Thanks; no worries. Yes, so the CVS file has some detection rates that have been reported in this paper: https://reason.com/wp-content/uploads/2020/04/Bommer-Vollmer-2020-COVID-19-detection-April-2nd.pdf So what I am doing is I essentially use these to fit a beta distribution for the detection rate and then adjust the cases for each day by averaging over this beta distribution (truncated within a given range that can be adjusted if needed). So essentially integrating f(x)*c/x, where f(x) is the fitted beta, x is the detection rate and c are the confirmed cases. This is the code essentially (which can be vectorized too):
|
Detection rates seem really hard to nail down. In King County WA the SCAN project's community survey came up with an infection rate of 0.24%, which yields ~5,400 cases, not much larger than the 5,000 reported cases [1]. At the other end an antibody survey just released for Santa Clara County CA put the infection rate at 2.5-4.0%, which would mean actual cases outnumber reported cases by 50x-85x [2]. [1] https://publichealthinsider.com/2020/04/17/greater-seattle-coronavirus-assessment-network-scan-releases-data-from-first-18-days-of-testing/ |
oh most certainly it is hard to nail down. This is why (imo) the "best" we can do is to account for it probabilistically - in the sense that some values for the detection rate are more probable than others (e.g., it is more probable that the detection rate is say 5% compared to say 80%). Even sero-surveys are not yet accurate since they use commercial tests that don't have good sensitivity and specificity (yet). The question is how do you define the density function for the detection rate and this is also tough. Now I am simply considering reported estimates and fitting a beta distribution to them. Most probably information about testing level needs to be considered too, while time should also be considered as the detection rate most probably (widely) changes over time. But all these factors will basically give rise to a different probability distribution for the detection rate. |
You could perhaps bound the distribution of detection rates by correlating them with hospitalization rates since they're both a function on the number of actual cases. |
Thanks this is very interesting! I appreciate the work and approaches. The testing one was my first question. |
@femto113 says estimate detection rates by correlating them with hospitalization rates. Yes! I think this is a much tighter estimate of infection rate than testing data. Especially if population age structure and smoking rates are taken into account. |
Do the commits on this retain the correction factors by by state, separate from the rate by confirmed cases/population? I think the correction rates should be indicated on each state's visualization if such correction factors are used. |
Because the link was a WordPress blog, I was concerned about the validity of the reference. http://www.uni-goettingen.de/en/606540.html so it is a paper from a professor. He is a professor of economics. There is no indication that the paper has been submitted for publication or any level of peer review. Open source scientific platforms suitable for peer commentary at least would make this more palatable as a model input. |
FWIW, my first model was based on the Diamond Princess data, where we had a control group that was 100% tested. From that I was able to determine how many showed strong symptoms, vs. how many tested positive. There was hand waving with the really poor FN/FP capability of the test they were using. From that I settled in on 20% of people will show symptoms consistent with COVID-19. So then I adjusted the R value to match the known number of cases with 0.2 times my predicted number of cases. So on March 24th the R value was 4.8 which was right in the range the CDC predicted. From that I was able to figure get a number of how many people were actually sick. At the current number of confirmed cases, it would indicate that 12.7 million people have actually been sick. The accuracy of this depends entirely upon the presumption that 0.2 is the right factor for symptomatic vs. asymptomatic. Right now I wouldn't bet my life on it. We can do probabilistic measurements but we would want to determine the risk of uncertainty and place error bars on that. |
You may want to also commit a copy of the notebook synced to a Markdown file through Jupytext (https://jupytext.readthedocs.io/en/latest/install.html) so that diffs are human-readable |
Actual cases are 10-20x the number of confirmed so I added an adjustment using a beta distribution for the detection rate and results from some recent reports on the degree of underreporting.
The results do not alter significantly (especially the trending of R_t) and further adjustments can be made by considering the level of testing in each state (county, country etc.). E.g., earlier in the process there most probably has been more under-reporting compared to today.