Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

adjusted for detection rate #7

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

kpelechrinis
Copy link

Actual cases are 10-20x the number of confirmed so I added an adjustment using a beta distribution for the detection rate and results from some recent reports on the degree of underreporting.

The results do not alter significantly (especially the trending of R_t) and further adjustments can be made by considering the level of testing in each state (county, country etc.). E.g., earlier in the process there most probably has been more under-reporting compared to today.

Actual cases are 10-20x the number of confirmed so I added an adjustment using a beta distribution for the detection rate and results from some recent reports on the degree of underreporting.
@k-sys
Copy link
Owner

k-sys commented Apr 18, 2020

Sorry diffs on notebooks are a pain, can you outline your approach for adjusting? You say you use a beta distribution but you also have a csv of fixed values? Very interested in this work...

@kpelechrinis
Copy link
Author

Thanks; no worries. Yes, so the CVS file has some detection rates that have been reported in this paper: https://reason.com/wp-content/uploads/2020/04/Bommer-Vollmer-2020-COVID-19-detection-April-2nd.pdf

So what I am doing is I essentially use these to fit a beta distribution for the detection rate and then adjust the cases for each day by averaging over this beta distribution (truncated within a given range that can be adjusted if needed). So essentially integrating f(x)*c/x, where f(x) is the fitted beta, x is the detection rate and c are the confirmed cases.

This is the code essentially (which can be vectorized too):

import scipy.integrate as integrate 

pcg = pd.read_csv("detection_rate.csv",header=None)

a, b, loc, scale= sps.beta.fit(list(pcg[0]))
rv = sps.beta(a, b, loc, scale)

for i in range(len(cases)):
    c = cases[i]
    f = lambda x: rv.pdf(x)*c/x
    # true cases anywhere between 10-20 times more than the confirmed, i.e., detection rates between 5-10% (this is adjustable)
    cases[i] = integrate.quad(f, 0.05, 0.1)[0]

@femto113
Copy link

Detection rates seem really hard to nail down. In King County WA the SCAN project's community survey came up with an infection rate of 0.24%, which yields ~5,400 cases, not much larger than the 5,000 reported cases [1]. At the other end an antibody survey just released for Santa Clara County CA put the infection rate at 2.5-4.0%, which would mean actual cases outnumber reported cases by 50x-85x [2].

[1] https://publichealthinsider.com/2020/04/17/greater-seattle-coronavirus-assessment-network-scan-releases-data-from-first-18-days-of-testing/
[2] https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1

@kpelechrinis
Copy link
Author

kpelechrinis commented Apr 19, 2020

oh most certainly it is hard to nail down. This is why (imo) the "best" we can do is to account for it probabilistically - in the sense that some values for the detection rate are more probable than others (e.g., it is more probable that the detection rate is say 5% compared to say 80%). Even sero-surveys are not yet accurate since they use commercial tests that don't have good sensitivity and specificity (yet). The question is how do you define the density function for the detection rate and this is also tough. Now I am simply considering reported estimates and fitting a beta distribution to them. Most probably information about testing level needs to be considered too, while time should also be considered as the detection rate most probably (widely) changes over time. But all these factors will basically give rise to a different probability distribution for the detection rate.

[1] https://www.nature.com/articles/d41586-020-01095-0

@femto113
Copy link

You could perhaps bound the distribution of detection rates by correlating them with hospitalization rates since they're both a function on the number of actual cases.

@patrickphelan
Copy link

patrickphelan commented Apr 20, 2020

Thanks this is very interesting! I appreciate the work and approaches. The testing one was my first question.

@bennettbrowniowa
Copy link

@femto113 says estimate detection rates by correlating them with hospitalization rates. Yes! I think this is a much tighter estimate of infection rate than testing data. Especially if population age structure and smoking rates are taken into account.

@bennettbrowniowa
Copy link

Do the commits on this retain the correction factors by by state, separate from the rate by confirmed cases/population? I think the correction rates should be indicated on each state's visualization if such correction factors are used.

@bennettbrowniowa
Copy link

Because the link was a WordPress blog, I was concerned about the validity of the reference. http://www.uni-goettingen.de/en/606540.html so it is a paper from a professor. He is a professor of economics. There is no indication that the paper has been submitted for publication or any level of peer review. Open source scientific platforms suitable for peer commentary at least would make this more palatable as a model input.

@Nectarineimp
Copy link

FWIW, my first model was based on the Diamond Princess data, where we had a control group that was 100% tested. From that I was able to determine how many showed strong symptoms, vs. how many tested positive. There was hand waving with the really poor FN/FP capability of the test they were using. From that I settled in on 20% of people will show symptoms consistent with COVID-19. So then I adjusted the R value to match the known number of cases with 0.2 times my predicted number of cases. So on March 24th the R value was 4.8 which was right in the range the CDC predicted. From that I was able to figure get a number of how many people were actually sick. At the current number of confirmed cases, it would indicate that 12.7 million people have actually been sick. The accuracy of this depends entirely upon the presumption that 0.2 is the right factor for symptomatic vs. asymptomatic. Right now I wouldn't bet my life on it. We can do probabilistic measurements but we would want to determine the risk of uncertainty and place error bars on that.

@nicholasjma
Copy link

nicholasjma commented Apr 23, 2020

You may want to also commit a copy of the notebook synced to a Markdown file through Jupytext (https://jupytext.readthedocs.io/en/latest/install.html) so that diffs are human-readable

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants