-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Train/test data includes schools, hospitals, and other facility types #106
Comments
I get an error when I launch the "binder" link above.
We join information about the business license at the time of inspection to the record of the inspection. We then filter the records to retain only "Retail Food Establishment" records. As you noticed, a lot of business types (like schools and hospitals) are subject to food inspections. It's important to note that businesses have many license types. For example, some have liquor licenses alongside their retail food license, and others do not. As you noticed, we only use inspections that have an associated business license description of "Retail Food Establishment". As far as the "other" license types you're noticing, perhaps you're not looking at the licenses at the time of inspection? It could be that they dropped their food related license(s). For example, maybe it's a book shop that once also served / sold food, but now just sells books. It's quite possible that you've found something, and I'll take a deeper look when we refactor the code, which should be happening in the next few months. The filtering is a little messy, and I think that this is something which will be fixed in the upcoming edits. |
Hi @geneorama, I have updated the Binder link above. Here it is again: It may take a while to load. In case there are still issues, here is a copy of the notebook. |
Sorry, had a hard time following the Python and wasn't working on this project. Now that I'm back in it, I think I see what's going on. We filter the business licenses were My understanding is that these are places that serve prepared food. However we do a lot of inspections in other places that sell packaged food or have kitchens. I think that some of these retail food places are selling prepared foods in places like grocery stores. We do model the inspection of that prepared food, but we do not model the inspection of the packaged food, which is a separate license. As I'm working on 2.0 I want to dig into this and be sure of the assumptions, so I'm glad you asked. The first time we did this I relied very heavily on prior art, but this time I want to understand it a bit more. Before my talk at UseR! 2016, I performed some analysis to see what kinds of places are being inspected to get a list of all licenses that are inspected. As I recall, it wasn't as simple as I had hoped, and I couldn't find a clear cut rule for "this is a place that would get inspected". The best regex I found was searching for these terms in the license description "Retail Food|Consumption|Caterer|Food|Child". Then I grouped them together. My final count looked like this:
This is old data, I'm not sure how it would hold up with new license designations. Digging into that now. |
According to the paper, inspections of hospitals and schools should not be included in the model train/test data. However, cross-referencing the model data with food inspection records from the Chicago data portal suggests that the model train/test data includes many different facility types, including hospitals and schools.
You can reproduce my Jupyter notebook that checks the facility types by launching it in Binder:
@tomschenkjr pointed out that there are at least two locations in the code that should filter out other types of facilities:
CODE/23_generate_model_dat.R
line 51CODE/30_glmnet_model.R
line 23This still leaves
1003
inspections with facility type listed as"Other"
. After cross-referencing with the data portal,994
inspections appear to be facilities other than restaurants or grocery stores.There also appear to be
11
inspections in the model train/test data that did not have a facility type in their record from the data portal query. Here is an excerpt from my query showing how I filtered the data portal records (SoQL):The text was updated successfully, but these errors were encountered: