student: Carlos Báez
Project report for the Deep learning postgrade at UPC tech talent (Barcelona). This report explains all the work done, results and extracted conclusions
The main idea in the project was the implementation of an end-to-end person recognition system. For this, I decided to split the project in two parts:
Detection. Study of different implemented algorithms and different datasets to choose the best option for us
Face Recognition. It is implemented and modified four different solutions with a saimese architecture.
At the beginning, my main motivation was the implementation of a complete pipeline for people recognition, where I analysed the different parts: detection and recognition. In the moment to work with recognition I liked the Siamese networks[1] and how they improve the performance then I decided to review it.
After this, I started to be interested in how a retrieval system can work and can be scalable applying cosine functions[2]. With this code, I could figure out how the extraction of features has a powerful role int this type of solution.
pipeline -> Main source folder
├── detections -> Detection pipeline
│ ├── db -> Datasets classes to configure dataset
│ │ ├── -> Necessary constants for dataset classes
│ │ ├── -> Parser from ellipse to rectangle for FDDB dataset
│ │ ├── -> FDDB dataset class which loads in memory all the dataset information
│ │ ├── -> Example to display FDDB dataset
│ │ └── -> Wider dataset class example
│ ├── Tiny_Faces_in_Tensorflow -> Folder for Tiny Faces model
│ │ ├── -> Entrypoint for tiny_faces model (Tensorflow)
│ └── yolo -> Folder for YOLO model
│ └── yolo -> Folder for YOLO package
│ ├── -> YOLO model
│ └── -> YOLO class, functions to call inference and high level functions for detection
├── -> Main
├── recognition -> Recognition pipeline
│ ├── -> CFP Dataset class
│ ├── -> Class to calculate threshold and accuracy
│ ├── -> Class to implement ranking
│ ├── -> Class with different models
│ ├── -> Builder params pattern to customize different tests
│ ├── -> Script to fix dataset paths
│ ├── -> Class to execute different tests
│ ├── -> Main class which train loop
│ ├── -> Data augmentation classes
│ └── -> Functions for different use cases
└── scripts -> General scripts
├── -> Script to execute and evaluate tiny faces
├── -> Script to execute and evaluate yolo
├── -> Download YOLO weights
├── -> README with dependencies
└── -> Other functions
scripts -> General scripts
├── graphs_model.ipynb -> Draw seaborn bubble graph
├── -> Script to upload from local to one server
├── print_graphs.ipynb -> Draw matplot graphs
├── -> Script to download from remote to local
├── test_practica_carlos.ipynb -> DEMO. First version of the final demo.
├── train.ipynb -> Collab to set up environment and ssh connection
└── value_models.csv -> Information to print in graphs ()
For the detection module. It was studied and analysed two neural networks and two datasets:
- Tiny Faces
- YOLO v3, trained for faces
For datasets, I did two differents:
- FDDB Dataset
- Wider Dataset
The bubble graph can give us a small overview about the differences of both: (accuracy, time and number of parameters for each network):
It was implemented a Siamese Network with VGG features. I got a pretrained VGG with Imagenet[3] and I applied a finetuning for faces.
In general, I implemented different networks with different loss techniques:
- Two siamese neural networks getting features from a VGG convolutional network and the application of a cosine similarity[5]
- Two siamese networks which a concatenation in order to join features and get a classification with a cross entropy loss[4]
- One siamese with a triplet loss function
About experiments, they are classified as:
- Change optimizer SGD or ADAM (With different learning rates and weight decay) (1e-3, 5e-4, 1e-4)
- It was tuned other parameters as weight decay, betas, momentum, etc... In order to find the best configuration that I added in the result table
- With and without data augmentation. In the data augmentation process with rotations, flips and jitter modifications.
- The idea is check if they have improvements. If it happens, add more modifications to improve the percentage.
- Changing the loss functions that means change the type of neural network
The backend architecute is a VGG16-bn (batch normalized) and its convolutional layers. They are used as a siamese network applying them in two images and get their features. For this project, it is used pretrained networks that speed up our training process with a pretrained neural network with Imagenet
After this point, it is applied different techniques to check the performance and compare results:
- First one, it applies a cosine similarity loss function to search better results with the convolutional layers
- v1 It is the simplest version, it only gets the VGG feature and It is applied the cosine loss function.
- v2 In this version, it is added a linear layer to flat the features that it is trained. Furthermore, It uses the cosine loss function too.
- In the second one, it is joined the two branches to get a classification. Furthermore, It is added improvements in order to achieve a better solution.
- The neural network named decision, it includes a minimal decision network with a few linear layers to do it. It is done after the concatenation of features (from the two branches)
- In the decision network linear, it is added a linear layer before this concatenation to improve the training and the performance. It tries to get better feature for our use case.
In this image, I can preview the VGG architecture and its convolutional module. It can give us an idea where It is extracted my features for the neural networks.
Previous architectures are depicted in the following schematics.
Two siamese cosines are very similar but the second one doesn't reuse VGG weights.. It gets worse the performance.
In the second type of architectures, they include the concatenation and the decision network to classify. The second done is adding an extra linear layer to train.
In order to evaluate which algorithm can fit better, I did different tests:
- The chosen dataset is the cfp dataset. It includes annotations for different or same pair of faces.
- The result table has the validation accuracy for the dataset, the idea is the calculation of the test accuracy (usin the splitted test dataset) for the best option of all.
- The table includes results of the tests but It was done some experiments to figure out how to tune parameters as the learning rate.
- The data augmentation applies jitter, flip and rotations for our images.
- The table includes the best accuracy with the best input hyperparameters that I could find.
Here, it is the table of results for the validation split:
Name | SGD | SGD + Data aug | Adam + Data aug |
Cosine v1 | 81.14 | 80.53 | 73.03 |
Cosine v2 | 71.35 | 73.03 | 70.75 |
Decision | 79.35 | 80.6 | 49.80 |
Decision linear | 78.28 | 81.71 | 76.75 |
Cosine v1 + triplet | 83.28 | 81.71 |
The winner in the benchmark is the Cosine v1 + Triplet + SGD optimizer and Data augmentation. With this choosen neural network, it is tested with the test data set where it is got these results:
Name | Validation accuracy | Test accuracy |
Cosine v1 + triplet + SGD + DA | 83.28 | 86.32 |
First experiments that I did is applying SGD to obtain first results that I will be able to compare with different configurations. Here, It is possible to check how it learns without problems.
The data augmentation helps in a better training. It is possible to check how the validation and training data are fitting better.
Furthermore, the Adam optimizer works well with cosine networks. It is possible to check how it is improved the process to find the best loss. Unfortunately, The accuracy was poor, I tried different values for the learning rate, weight decay (0, 0.001, 5e-4) but It doesn't help, I got the conclusion I need more time to find the best hyperparameters for our case. For this, I stopped this study line.
My last test was the implementation of the triplet loss where I got the best results. The idea to use negative and positive images in the loss function provide more comparative information to the loss function (para metric was used by default, in this case 1.0)
I did the same experiments for the decision layers. In the first experiments, I could already detected how the performance is poor and after more experiments I could confirm it.
It is possible check how the overfitting happens very fast, and I starts to figure out that It is not the best workflow in my use case.
Here, I figured out that the data augmentation is not improving the values, the overfitting only happens some epochs after.
Applying Adam, in this case, was exhausting... I tried different hypeparameters values but the accuracy was not better.
- In general, Siamese cosine v1 works better.
- The Cosine similarity loss works better than any type cross entropy.
- A Backend pretrained architecture seems a good workflow to research more about this
- The best recipe: cosine v1 + SGD + Data augmentation + Triplet loss weights
- Decision layers have problems to train with the dataset, the overfit appears very fast (7 or 6 epoch). It is very important to tune params and add data augmentation.
- In this particular case, I had problems to find the best parameters. In other cases, It can works well (as siamese cosine networks), but It seems that It is necessary more time for a good tuned.
- In general, decision networks need more epochs to learn better due to train the decision network and new layers.
Installation (for python 3.6+)
pip install -r requirements.txt
The code has different entrypoints for different use cases (detection, recognition, creation of graphs, parsing, upload data). The main split of the work is in two main use cases: detection and recognition where they are:
- Detection -
to execute the benchmark and run the detection algorithms - Recognition -
is the script to train a new neural network. For the triplet, it is the last one implemented, it needed a set of important changes in the architecture, for this reason, it is created a different
It is important to comment that I didn't add argument line parser because It was not clear the requirements while I was developing.. For this reason, you must change different paths (datasets, weights, etc..) paths for your environment.
Then, to execute the training It must be something like this (for python 3.6+):
# In the recognition directory
for triplet training similar:
# In the recognition directory
- If you must change parameters, you change the Builder Params pattern, it is used to customize your parameters[3]
NOTE: It is obvious that the code has technical debt, my main effort was to find the best architecture and parameters.. The code needs to be refactorized.
To get the validation and test accuracy for recognition. From the recognition folder, you can execute
(for python 3.6+)
python path_saved_model_file [threshold]
If you add the threshold, it will calculate the accuracy taking care the argument, otherwise it will calculate the best threshold for the dataset and calculate both accuracies.
The demo is included in scripts/
[1] Siamese networks
[2] Cosine loss
[3] Imagenet
[4] Cross entropy loss
[5] Cosine similarity