-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
FIPS code for New York City and Rhode Island Missing? #105
Comments
county = 'New York City' is a manual aggregation in this data set (probably because The New York Times reports on metro areas, or perhaps because their data pre-aggregates the five borough/counties into a single 'New York City' data point(s). This means there is no fips code for 'New York City' I set the fips code in the input file (us-counties.csv) to 0 for New York City and then, when joining other data (wherein fips code is called geo_id) I use : |
Thanks @datamusing for pointing out Rhode Island. I missed that: |
Thanks for the clarification on NYC!
…On Wed, Apr 1, 2020 at 10:44 AM Michael Kelleher ***@***.***> wrote:
Thanks @datamusing <https://github.com/datamusing> for pointing out Rhode
Island. I missed that:
select CASE WHEN state = 'Rhode Island' AND county = 'Unknown' THEN 'Rhode
Island' ELSE county END county, state, date, cases, deaths,
CASE
WHEN county = 'New York City' THEN 0
WHEN state = 'Rhode Island' THEN 1
ELSE fips
END fips from cv.nyt_counties) nc
LEFT OUTER JOIN
(select
CASE
WHEN geo_id IN ( '36005', '36047', '36061', '36081', '36085' ) THEN '0'
WHEN geo_id IN ( '44001', '44003', '44005', '44007', '44009' ) THEN '1'
ELSE geo_id
END geo_id,
sum(nonfamily_households + family_households ) households,
sum(total_pop) total_pop,
...
FROM bigquery-public-data.census_bureau_acs.county_2017_5yr
GROUP BY 1) pop
ON cast(pop.geo_id as int64) = cast(nc.fips as int64)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDNYPORLKYNOHXGWMGRL7DRKN4PHANCNFSM4LYK5DSQ>
.
|
For Rhode Island, only the
|
Thanks @albertsun. Also, awesome work. I'm just making adjustments (reaggregating at the level of a set of counties) so I can join on other fips datasets without having missing values. I lose some granularity, but I prefer that trade off to having cases / deaths that cannot be joined against population, demographcs, aggregated land area, etc. |
@MichaelKelleher et al given that there are FIPS codes pertinent to New York City... and given that NYC has official data at this level (see https://www1.nyc.gov/site/doh/covid/covid-19-data.page) why is New York City being aggregated into a single data point per day? |
@albertsun might know the answer here, but perhaps I can save him the effort with a little speculation (I'm guessing that, since he's at the NYT, he might have his hands full.) |
+1
Yes! Thanks for this data, Albert. It is so cool to have such verified
resource at hand.
We can do our own munging and imputations where needed, but certainly
won’t want to break any established data models.
Cheers,
S
…On Thu, Apr 2, 2020 at 3:26 PM Michael Kelleher ***@***.***> wrote:
@albertsun <https://github.com/albertsun> might know the answer here, but
perhaps I can save him the effort with a little speculation (I'm guessing
that, since he's at the NYT, he might have his hands full.)
Many other datasets (JHU, et al.) started collecting and making covid19
data available without much expectation of the interest in that data.
Further, they provided no Service Level Agreement for those resources.
Naturally, when JHU changed their schema, got embroiled in a political
outrage at thier classifications of countries, realized they couldn't
provide data at the original granularity, etc., users were, to put it
mildly, annoyed. "You've broken thousands of models!"
Certainly, disseminators of resources can do a better or worse, more
professional or less, job in managing data resources, but they also must
make decisions, and in the case of covid19 data, I imagine with a grave
sense of urgency.
The New York Times took on the daunting project of collecting, verifying,
and (I didn't think they were savvy enough for it) making county level data
available on github. git pull origin master and I have the latest data, for
free, on my local instance. That is simply amazing to me.
The decision (conscious or not) to break the fips data model by manually
inserting 'bogus' counties with no fips code, to me, falls under the
category of petty annoyance. Annoying because it requires 8 lines of code
and a fips look up. Petty because I'm a professional data analyst and I
basically have to do similar things at least once a week with data created
by engineers that, although they too have a sense of urgency, have more
time to think through the ramifications of their small set of data
consumers.
New York City, across it's five counties, is under tremendous resource and
human strain and perhaps our heroes at the New York Times felt it was not
worth the use of any resource to disambiguate cases into the appropriate
fips-coded county, but simply to dump them all in the bucket of New York
City. Perhaps their data source cannot verify at the fips level and they
prefer to be accurate rather than precise. I don't care. I'm grateful for
the data.
For the most part, in my own analysis, to date at least, the fips level of
data can only be used for "vanity" metrics. I cannot infer much about the
future, or see much about the to-date efficacy of shelter-in-place orders
by shelter-tenure, for example, on a county level, so I'm aggregating
anyway. I do, however, now get to choose the way in which I aggregate,
urban vs. not, parts of states, metro areas, etc.
Why Rhode Island? Why Kansas City? For god's sake, I would love to know
what is special about Joplin County. They did, at least, annotate in REAMDE
the decisions they made. Hopefully someday we'll get to buy these data
savages a beverage, thank them in person, and hear the stories behind all
of these. For now, does anyone (not at the New York Times!) have the time,
availability, and inclination to create a county level (fips, FTW!)
shelter_effective_date .csv and make a pull request?
I hope this didn't come off as ranty, I'm truly grateful for all the great
work, and heartened by the number of people that have interest in and
ability with such data sets.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDNYPLDH3J63XTPGJBRIZTRKUGK3ANCNFSM4LYK5DSQ>
.
|
@MichaelKelleher yea I empathize, and I wasn't really complaining. |
I didn't assume it was a complaint, and I admitted to "petty annoyance" myself! Really just wanted to express some of my thoughts about what I imagine is going on in the data trenches and how I'm reacting to it. I'm on my third pipeline, have written 2,000 lines of code, built 20 predictive models 20 times each, developed new metrics (my latest favorite is three_days_in_a_row_death_growth_rate_decreasing for US states), and first thing I do in the morning is pull al the repos, build the data, and check my visualizations. I know more about world geography (especially population density) than ever! (You folks know of any more appropriate place to have these conversations?) |
No description provided.
The text was updated successfully, but these errors were encountered: