Skip to content

Police-Data-Accessibility-Project/pa_court_scraper

Repository files navigation

pa_court_scraper

Court Scraper for Pennsylvania

Installation

Set Up Docker

Ensure that Docker is installed on your system.

Docker ensures that the majority of the application operates inside a "container" that is isolated from the rest of the system.

Set Up Python

Ensure that Python 3.13 is installed on your system.

While there is a requirements.txt file, that is used within the Python Dockerfile, not by the user directly. Instead, the user only needs to install the docker package

pip install docker

Commands

Get Docket Numbers

  • This script parses the webpage and writes the docket numbers to a text file.
  • The text file is located at data/docket_numbers_from_yesterday.txt.
python main.py get-docket-numbers

Get Docket Information

  • This script retrieves all docket information from the above text file and stores it in a MongoDB database.
  • The MongoDB database is located at mongodb://mongo:27017/.
    • Currently, the database is "mydatabase", and the collection is "mycollection".
  • Note that this script takes an extended period of time to run, as requests are staggered to avoid rate-limiting.
    • To reduce the number of docket numbers retrieved, delete docket numbers from the text file
python main.py get-docket-info

Stop MongoDB instance

  • The MongoDB instance does not stop automatically after the above two commands are run
  • The below command will stop and remove the MongoDB container (deleting all data within it)
python main.py stop-mongodb

Review Results

Results can be reviewed using MongoDB Compass. Note that the MongoDB container must be up and running to review results.

About

Court Scraper for Pennsylvania

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published