Skip to content

Latest commit

 

History

History
77 lines (49 loc) · 3.92 KB

File metadata and controls

77 lines (49 loc) · 3.92 KB

Wine Quality Predictor

Authors: Sid Ahuja, Zackarya Hamza, Alexander Dawson

Demo of a data analysis project for DSCI 310 (Reproducible & Trustworthy workflows); a course in the Data Science faculty.

About

In this project, we build a prediction model using the k-nearest neighbours algorithm which attempts to categorize the quality of a wine based on its' physiochemical properties. We classify wine quality into a binary category: whether it is good or bad. Our classifier performed moderately well on the test set, but further research must be done to improve the model before it is put into production.

The dataset that we used for this project is about white variants of the Portugese "Vinho Verde" wine, which was assembled by Paulo Cortez, A. Cerdeira, F. Almeida, T.Matos, and J.Reis. The dataset was sourced from UCI Machine Learning Repository (Dua and Graff 2017), located here. Each row in this dataset showcases an observation of a white wine, specifically related to its physicochemical and sensory attributes.

Dependencies

Docker is a container solution used to manage the software dependencies for this project. The Docker image used for this project is based on the quay.io/jupyter/r-notebook:2024-03-14 image. Additional dependencies are specified in the Dockerfile.

Usage

Use the steps below to reproduce this analysis.

Setup

  1. Install and launch Docker on your computer.
  2. Clone this GitHub repository.

Running the analysis

Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

docker-compose run --rm analysis-env make clean

To run the analysis in its entirety, enter the following command in the terminal in the project root:

docker-compose run --rm project-image make all

Working with container JupyterLab, the terminal, or VSCode

To work with the project and container in JupyterLab, use terminal to navigate to the root of this project and enter:

docker compose up

Look in the terminal for a URL that starting http://127.0.0.1:8888/lab?token= and copy/paste it into a browser. The JupyterLab IDE will load. Do not close the terminal while in use, otherwise you will lose your current session.

Clean up: type Ctrl + c in the terminal. Enter docker compose rm in the terminal to remove the container.

To work with the project using just the terminal, navigate to the root of this project and enter:

docker compose run --rm analysis-env bash

To exit the container and clean up, enter exit in the terminal.

To work in VSCode , open VSCode and launch the terminal from there. Navigate to the root of this project and enter:

docker compose run --rm analysis-env bash

To exit the container and clean up, enter exit in the terminal.

Report

The final report can be found here.

License

Our report is licensed under the MIT License. See LICENSE for additional information.

References

Cortez,Paulo, Cerdeira,A., Almeida,F., Matos,T., and Reis,J.. (2009). Wine Quality. UCI Machine Learning Repository. https://doi.org/10.24432/C56S3T.

CVRVV. (2024). Vinho Verde. https://www.vinhoverde.pt/en/homepage

Tiffany Timbers, Trevor Campbell. “Data Science.” Data Science, 23 Dec. 2023, datasciencebook.ca/.

Chester Ismay and Albert Y. Kim  Foreword by Kelly S. McConville. “Statistical Inference via Data Science.” Statistical Inference via Data Science, 13 Feb. 2024, moderndive.com/index.html.