YOLO (You Only Look Once) is one of the fastest and most popular one-stage object detector currently in the field. Unlike other typical two-stage object detectors like R-CNN, YOLO looks at the entire image once (with a single neural network) to localize/classify the objects that it was trained for. The most up-to-date version (YOLO version 3) is shown to be highly competitive to other object detectors in terms of mAP, but about > 4 times faster. The detailed comparison can be found on their official website.
The original YOLO algorithm is implemented in Darknet framework by Joseph Redmon. This open-source framework is written in C and CUDA, and its detailed implementation can be found on the Darknet project website. The pre-trained models and weights are fully available and these can be used to detect in real-time the objects that the model has been trained for.
Along with the original darknet, we also have a new framework called darkflow. Thanks to thtrieu, the darknet framework has been translated into TensorFlow under the name of darkflow. With darkflow, YOLO (only support up to YOLO version2 yet!) can be retrained/customized on TensorFlow.
The detailed implementation of YOLO on darkflow is described in its repository where it utilizes the flow method on a console. This article describes the step-by-step process of running/training YOLO on Jupyter notebook.
It would be a good idea to create a new virtual working environment for YOLO as it may require some modules on their previous versions.
Recommended modules/versions include:
- TensorFlow 1.12.0
- CUDA 9.0 (for GPU implementation)
- opencv 3.4.1 or 3.4.5
- Cython 0.29.10
- VisualStudio 2015 and essential build tools (Microsoft Build Tools 2015 Update 3): This is essential in building Cython extensions in place (C-based darknet —> python-based darkflow).
-
On the desired path, the darkflow github repository can be cloned.
git clone https://github.com/thtrieu/darkflow
-
Build the Cython extensions in place.
python setup.py build_ext —inplace
orpip install -e
orpip install .
-
Create a folder:
- ../darkflow/bin - to store weights
-
Download the YOLO model and the corresponding weights from HERE (please note that darkflow does not support YOLO version 3 yet).
-
save the model to ../darkflow/cfg and the weight to ..darkflow/bin.
-
Make sure the labels.txt file contains the correct class names/numbers for the model being used (example).
from darkflow.net.build import TFNet
import tensorflow as tf
import cv2
import os
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth=True
with tf.Session(config=config) as sess:
options = {
'model':os.path.join('cfg','yolov2.cfg'), #(1)
'load':os.path.join('bin','yolov2.weights.weights'), #(1)
'threshold':0.5, #(2)
'gpu':1.0 #(3)
}
tfnet = TFNet(options)
(1) custom names for a model and the corresponding pretrained weights
(2) the threshold value (0.0-1.0) for non-maximum suppression
(3) the portion of the gpu memory to use
Expected output:
img = cv2.imread(os.path.join('test_images','test_image_general.jpg'))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
result = tfnet.return_predict(img)
for r in result:
tl = (r['topleft']['x'], r['topleft']['y'])
br = (r['bottomright']['x'], r['bottomright']['y'])
label = r['label']
conf = r['confidence']
text = '{}({:.2f})'.format(label,conf)
(text_width, text_height) = cv2.getTextSize(text, cv2.FONT_HERSHEY_PLAIN,2,5)[0]
text_offset_x = tl[0]-3
text_offset_y = tl[1]
box_coords = ((text_offset_x, text_offset_y+10), (text_offset_x+text_width,text_offset_y-text_height-10))
img = cv2.rectangle(img, box_coords[0], box_coords[1], (255,0,0), cv2.FILLED)
img = cv2.rectangle(img,tl,br,(255,0,0),thickness=4)
img = cv2.putText(img,text,tl,cv2.FONT_HERSHEY_PLAIN,2,(255,255,255),3)
plt.figure(figsize=(10,10))
plt.imshow(img)
plt.axis('off')
plt.subplots_adjust(left=0.2, wspace=0)
Expected output:
This idea can be simply extended to real-time object detection on saved/live-stream videos. Some examples can be found in my [HappyDogDetector project](https://github.com/sungsujaing/Happy_Dog_Detection).In addition to the configuration described above:
-
create folders:
- ../darkflow/ckpt - to store checkpoint models/weights
- ../darkflow/train/images - to store training images (folder names can be customized)
- ../darkflow/train/images-annot - to store the labels of training images (folder names can be customized)
-
modify the
labels.txt
file to contain the classes to be trained for. -
modify the .cfg file as the following:
-
in the last [region] section, change the
classes
value to the number of classes to be trained for. -
in the second last [convolutional] section, change the
filters
value based on the newly definedclasses
value. The following equation can be used to calculate a new number offilters
.new_filter_number = 5 * (5 + new_class_number)
-
- collect images for custom object detection (> 300 images at least) - one of the easiest ways is to scrap them from online. For this option, check out this article about
google_images_downloads
module. - Label images using
labelimg
- a graphical image annotation tool for object/bouding_box labeling. The prebuilt binary can be downloaded HERE while the official repository also contains lots of useful information.
config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth=True
with tf.Session(config=config) as sess:
options = {
'model':os.path.join('cfg','yolov2_new.cfg'), #(1)
'load':os.path.join('bin','yolov2.weights'), #(2)
'epoch':100, #(3)
'batch':16, #(4)
'train':True, #(5)
'dataset':os.path.join('..','downloads','img'), #(6)
'annotation':os.path.join('..','downloads','img-annot'), #(6)
'gpu':0.8 #(7)
}
tfnet = TFNet(options)
tfnet.train() #(8)
(1) the modified .cfg file
(2) the same weights file for the transfer learning
(3) number of epochs for training
(4) the batch size for training
(5) indicate that the training will take place
(6) paths to the prepared images and the corresponding annotations
(7) the portion of gpu memory to be used for training
(8) training begins
with tf.Session(config=config) as sess:
options = {
'model':os.path.join('cfg','yolov2_new.cfg'), #(1)
'load':-1, #(2)
'gpu':1.0, #(3)
'threshold':0.3 #(4)
}
tfnet_new = TFNet(options)
tfnet_new.load_from_ckpt()
(1) the modified .cfg file
(2) loading the newly trained weights (the latest layer)
(3) the portion of the gpu memory to be used for testing
(4) the threshold value (0.0-1.0) for non-maximum suppression
output examples of the transfer learning codes will be added shortly