Repository containing the ML training pipeline
Dependencies can be installed using the requirements.txt, the Python version used to run this project is 3.9.16, if you don't have this installed consider using pyenv.
Dependencies can be installed using the requirements.txt
pip install -r requirements.txt
DVC (Data Version Control) is a version control system for data and ML models. It helps you manage your ML experiments and models efficiently. This project uses DVC to manage the ML training pipeline.
This dvc is set up in two different stages, the preprocessing_dataset stage and the train_model stage.
To fetch the current version of the pipeline one can use: dvc pull
.
To run the pipeline use dvc repro
and to force a rune use dvc repro -f
.
Now you can see the metrics by using dvc metrics show
.
If you prefer not to use DVC, you can run the pipeline directly.
python -m src.app
This project uses pytest for testing. To run the tests, use the following command:
pytest
This project is licensed under the MIT License.