Project: cat's breed detection

Project created at cohort 2022 of ML Zoomcamp course.

The solved problem is a classification problem. We try to predict the breed of a cat by its photo.
It can be useful for cat's owners to know the breed of their cat, and for cat's shelters to know the breed of the cat to find the owner.

Sources of data

In this project, I used the data from the Cat Breeds Dataset on Kaggle. It contains 126'607 images of cats of 67 breeds. The most popular breeds are:

Domestic Short Hair: 53027
Domestic Medium Hair: 5482
American Shorthair: 5295
Domestic Long Hair: 4499
Persian: 4018
Tortoiseshell: 3963
Calico: 3468
Torbie: 3396
Dilute Calico: 3230
Tuxedo: 3181
Dilute Tortoiseshell: 3152
Tabby: 3012
Siamese: 2888
Ragdoll: 2669
Bengal: 2477
Tiger: 2256

Images with no cats

I also decided to add "No cat" category, to allow model find images where there is no cat.
I used photos from House Rooms Image Dataset. It contains 5'250 photos of rooms.

Preparing data

There is way too many photos of cats. To remove unbalance of domestic cats, and to make training faster, I decided to limit the number of photos for each breed to 1000.

The script for combining datasets and shrinking of breeds is presented here.
The resulted dataset is published in Google Drive

To run this project you need to download and unzip this dataset to the folder data.

EDA

Let's look at some photos from the dataset.
There is notebook with overview of the dataset here

Model selection

I decided to use transfer learning. I used keras framework and tensorflow backend.
I have 68 classes in the dataset, so I decided to top 5 accuracy metric, because it is more suitable for multiclass classification.

In this part of the project, Saturn Cloud helped me a lot.
With their Jupyter Lab server I ran this notebook to try different models.

I tried to use different models:

Xception + hidden(256) + dropout(0.25)
EfficientNetB4 + hidden(256) + dropout(0.25)
EfficientNetB4 + hidden(256)
EfficientNetB4 + hidden(100) + dropout(0.25)
EfficientNetB4 + hidden(100)

Results

The best result was achieved with EfficientNetB4 + hidden(256).

I also tried to use different optimizers: Adam, SGD, RMSprop. The best result was achieved with Adam.

The resulted model is published in Google Drive

Score for this model is 0.70 for top 5 accuracy metric.
Script for training the model is here
If you want to train the model, you need to download and unzip the dataset to the folder data, and then run the script:

pipenv run python scripts/train_model.py

Deployment

I used tensorflow saved_model format for deployment.
To convert the model to saved_model format I used convert_to_saved_model.py
You need to train model first, or download model from Google Drive

pipenv run python scripts/convert_to_saved_model.py

Creating service

There is the notebook with example of using deployed model in docker container.
After that, I converted it to the gateway.py script.
And then I created Flask app for the API.

I used postman for testing. To use the service you need to send POST request to the endpoint http://localhost:9696/predict with json body:

{
  "url":"https://github.com/rzabolotin/ml_zoomcamp_2022_project_2/blob/main/static/burmila.jpg?raw=true"
}

Containerization

I used docker and docker-compose for local deployment:

image-model for building docker image for model serving.
image-gateway for building docker image for flask gateway.
docker-compose.yml for running docker containers together.

To run the project you need run docker-compose. It will build docker images and run containers.

docker-compose up

Local kubernetes deployment

I used kind for local kubernetes deployment.

To run the project you need to run the following commands:

# create kubernetes cluster
kind create cluster 

# apply all kubernetes configs
kubectl apply -f kube-config/model-deployment.yaml 
kubectl apply -f kube-config/model-service.yaml
kubectl apply -f kube-config/gateway-deployment.yaml
kubectl apply -f kube-config/gateway-service.yaml

# make port forwarding to gateway service
kubectl port-forward service/gateway-service 80:9696

After that you can send the same POST request to http://localhost:9696/predict, and service will reply with json answer.

Deploying to AWS EKS

I used eksctl for creating EKS cluster.

eksctl create cluster -f kube-config/eks-config.yaml

Then you need to create ECR repository for docker images and push them there.

aws ecr create-repository --repository-name ml-zoomcamp
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

docker tag breed_model:v3-001 123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-zoomcamp:breed_model-v3-001
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-zoomcamp:breed_model-v3-001

docker tag breed_gateway 123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-zoomcamp:breed-gateway
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-zoomcamp:breed-gateway

Then you need to apply kubernetes configs:

kubectl apply -f kube-config/model-deployment.yaml
kubectl apply -f kube-config/model-service.yaml
kubectl apply -f kube-config/gateway-deployment.yaml
kubectl apply -f kube-config/gateway-service.yaml

After that you can send the same POST request, but sent it to EKS public API endpoint.
(for me it was http://a1554c88daf744e1a85752b08be1e24c-1291281226.us-east-1.elb.amazonaws.com/predict, but I deleted the cluster, so it is not available now)

Used technologies

Python
Tensorflow
Saturn Cloud (https://www.saturncloud.io/)
Docker
Postman
Kind
AWS EKS

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
docker		docker
kube-config		kube-config
models		models
notebooks		notebooks
scripts		scripts
static		static
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: cat's breed detection

Sources of data

Images with no cats

Preparing data

EDA

Model selection

Results

Deployment

Creating service

Containerization

Local kubernetes deployment

Deploying to AWS EKS

Used technologies

About

Releases

Packages

Languages

License

rzabolotin/ml_zoomcamp_2022_project_2

Folders and files

Latest commit

History

Repository files navigation

Project: cat's breed detection

Sources of data

Images with no cats

Preparing data

EDA

Model selection

Results

Deployment

Creating service

Containerization

Local kubernetes deployment

Deploying to AWS EKS

Used technologies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages