(somewhat) cleaned-up versions of notebooks used in researching public comments for FCC Proceeding 17-108 (Net Neutrality Repeal). I am posting the notebook for Exploratory Data Analysis first, and will include others as they are cleaned up.
See below in the prerequisites section.
4 more notebooks have been uploaded. Run in numerical order to reconstruct the data processing pipeline. Notebook 4 contains the charts. Data for the final couple notebooks is being uploaded and will be linked here tomorrow morning.
I did this project as a part of the coursework for Metis and was shocked to see my analysis blow up online. Humbled by the attention but I'm sure experienced data scientists out there could glean even more insights from the work. Please share with the rest of us what else you find in the data! Tweet at me @jeffykao. :-)
This is just a rough sketch of the instructions to the get project up and running on your local machine. Once you get Anaconda installed on your machine, the libraries should be easy to install and the notebooks should be fairly straightforward to run. Instructions to install each library should be easily googlable (sp?).
First set of data (text and duplicate counts only) posted on kaggle. The README on kaggle will contain links to other versions and subsets of the same dataset.
I'm working hard to get non-text data up as well and will let you know the progress by tweet @jeffykao.
- Python 3.6.1 (64-bit)
- conda 4.3.29
This project is licensed under the MIT License - see the LICENSE file for details
- @drob for putting the blog post on blast and giving me some great advice in the aftermath
- @leland_mcinnes for authoring HDBSCAN
- @bekcunning for sending me the link that made me finally write that g***** blog post!
- @prb_data & Joe Eddy, my instructors at Metis
- @AndrewDBS who convinced me to get a twitter account
- My amazing & creative wife/editor who read through & greatly improved my drafts
- Sweat pants.