Court Scraper for Pennsylvania
Ensure that Docker is installed on your system.
Docker ensures that the majority of the application operates inside a "container" that is isolated from the rest of the system.
Ensure that Python 3.13 is installed on your system.
While there is a requirements.txt file, that is used within the Python Dockerfile, not by the user directly.
Instead, the user only needs to install the docker
package
pip install docker
Get Docket Numbers
- This script parses the webpage and writes the docket numbers to a text file.
- The text file is located at
data/docket_numbers_from_yesterday.txt
.
python main.py get-docket-numbers
Get Docket Information
- This script retrieves all docket information from the above text file and stores it in a MongoDB database.
- The MongoDB database is located at
mongodb://mongo:27017/
.- Currently, the database is "mydatabase", and the collection is "mycollection".
- Note that this script takes an extended period of time to run, as requests are staggered to avoid rate-limiting.
- To reduce the number of docket numbers retrieved, delete docket numbers from the text file
python main.py get-docket-info
Stop MongoDB instance
- The MongoDB instance does not stop automatically after the above two commands are run
- The below command will stop and remove the MongoDB container (deleting all data within it)
python main.py stop-mongodb
Results can be reviewed using MongoDB Compass. Note that the MongoDB container must be up and running to review results.