Project Title: Predict the Probability of a Delivery getting a wicket in Cricket
Contributors:
- Alex Lin
- Jackson Siemens
- Shruti Vijaykumar Seetharam
- Hanlin Zhao
Our project works on predicting the probability of getting a wicket on a delivery in Cricket based off of game plays. Being avid sports fans, and wanting to explore a new sport, we looked towards Cricket and specifically T20 International matches to do our analysis. Our analysis uses ball-by-ball data from Cricsheet, downloaded in json
format and then converted to .parquet
format.
This project will be managed using the virtual containerization tool Docker. The container image used for the project is based on the quay.io/jupyter/scipy-notebook:2024-02-24
image. Other dependencies are specified in the Dockerfile
.
- Install and setup Docker on your computer. Keep Docker Desktop open when running the analysis.
- Clone this GitHub repository as specified below:
git clone https://github.com/DSCI-310-2024/DSCI-310-Group-11.git
Note: Our data is fairly large, and cloning it may result in an issue with having the .parquet
data on your local. The scripts will run and still produce the correct data files needed.
- Clear any previous analysis done by entering the following statement on the command line.
docker-compose run --rm analysis-env make clean
- Run the analysis by entering this:
docker-compose run --rm analysis-env make all
The final report can be found as main_report.html
in the reports
folder, the images in the images
folder and the data in the data
folder.
Note: Since the data was downloaded off of the website in json
format, we do not have a script that reads the data off the web. The data was downloaded as the website structure does not allow for reading it off the web.
If you are working with the project on JupyterLab, follow the steps below.
- After doing the steps in Setup section, navigate to the root of this project on your computer as follows:
cd DSCI-310-Group-11-Cricket-Analysis
- Enter the following command to setup the virtual container:
docker-compose up
- Once the command has run, you will receive options to launch JupyterLab through different URLs. Copy and paste the one starting with
http://127.0.0.1:8888/lab?token
. You will now be able to work in the JupyterLab IDE with all the files used in the analysis visible in the file browser pane. - Shut down the container after using it by typing
Ctrl + C
in the terminal. Clean the container up by typingdocker-compose rm
in the terminal.
To add any other dependencies to the analysis:
- Create a new branch and add the dependency to the Dockerfile file.
- Re-build the new image locally to make sure everything runs properly. We recommend using the tag
dsci310_group11:latest
when building. - Push the changes to GitHub. Since we have built our GitHub actions workflow, the new image will be pushed to DockerHub automatically.
- Update the image tag on the
docker-compose.yml
file so that it uses the new container image. - Open a pull request and merge the new branch with the main branch.
The License was derived from the MIT License. The License and code for building the GitHub Actions Workflow for containerization was derived from the Dockerfile Practice Repository
The data was all sourced from Cricsheet.