Skip to content

Scraping, parsing and re-publishing University of Missouri Police Department Incident Reports

Notifications You must be signed in to change notification settings

gordonje/mupd_reports

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

mupd_reports

Scraping, parsing and re-publishing University of Missouri Police Department Incident Reports

Intro

The University of Missour Police Department publishes data on its website about their cases. They're doing a great job keeping the data up-to-date, but there are a couple of problems:

  1. The incident page has filter options to find specific kinds of incidents within a date range and/or at a specific address, which is nice. But some of the less common charges, like making a terrorist threat, aren't categorized under an incident type. Furthermore, not all cases originate from an incident report, so you won't even find those cases on this list.

  2. The daily clery reports include every case and more information about each one, including the exact charges and the current disposition of the case. But, the daily reports are published as pdfs, which prevents any searching or analysis.

We can do better. Here's how:

  1. Download the daily clery reports;
  2. Extract the text from the pdf pages;
  3. Parse that text into a database;
  4. Build a web app for users to interact with this improved data.

Dependencies

  • Python 2.7 +: An interpreted, object-oriented, high-level programming language;
  • requests: For handling HTTP request;
  • html5lib: For parsing HTML the same way any major browser would;
  • beautifulsoup 4: For conveniently manipulating the parsed HTML.

About

Scraping, parsing and re-publishing University of Missouri Police Department Incident Reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages