The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-by-detection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation. We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
@inproceedings{bergmann2019tracking,
title={Tracking without bells and whistles},
author={Bergmann, Philipp and Meinhardt, Tim and Leal-Taixe, Laura},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={941--951},
year={2019}
}
We implement Tracktor with independent detector and ReID models. To train a model by yourself, you need to train a detector following here and also train a ReID model following here. The configs in this folder are basically for inference.
Method | Detector | ReID | Train Set | Test Set | Public | Inf time (fps) | MOTA | IDF1 | FP | FN | IDSw. | Config | Download |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | Y | - | 61.8 | 64.9 | 1235 | 6877 | 116 | config | detector | detector_log | reid | reid_log |
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | N | - | 66.8 | 68.4 | 3049 | 3922 | 179 | config | detector | detector_log | reid | reid_log |
Method | Detector | ReID | Train Set | Test Set | Public | Inf time (fps) | MOTA | IDF1 | FP | FN | IDSw. | Config | Download |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | Y | - | 54.1 | 61.5 | 425 | 23894 | 182 | config | detector | detector_log | reid | reid_log |
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | N | - | 63.4 | 66.2 | 4175 | 14911 | 444 | config | detector | detector_log | reid | reid_log |
The implementations of Tracktor follow the official practices. In the table below, the result marked with * (the last line) is the official one. Our implementation outperform it by 4.9 points on MOTA and 3.3 points on IDF1.
Method | Detector | ReID | Train Set | Test Set | Public | Inf time (fps) | MOTA | IDF1 | FP | FN | IDSw. | Config | Download |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | Y | 3.2 | 57.3 | 63.4 | 1254 | 67091 | 614 | config | detector reid |
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | N | 3.1 | 64.1 | 66.9 | 11088 | 45762 | 1233 | config | detector reid |
Tracktor | R50-FasterRCNN-FPN | R50 | train | test | Y | 3.2 | 61.2 | 58.4 | 8609 | 207627 | 2634 | config | detector reid |
Tracktor* | R50-FasterRCNN-FPN | R50 | train | test | Y | - | 56.3 | 55.1 | 8866 | 235449 | 1987 | - | - |
Tracktor (FP16) |
R50-FasterRCNN-FPN | R50 | half-train | half-val | N | - | 64.7 | 66.6 | 10710 | 45270 | 1152 | config | detector | detector_log | reid | reid_log |
Note:
FP16
means Mixed Precision (FP16) is adopted in training.
The implementations of Tracktor follow the official practices. In the table below, the result marked with * (the last line) is the official one. Our implementation outperform it by 5.3 points on MOTA and 2.1 points on IDF1.
Method | Detector | ReID | Train Set | Test Set | Public | Inf time (fps) | MOTA | IDF1 | FP | FN | IDSw. | Config | Download |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | Y | - | 70.6 | 65.4 | 3652 | 175955 | 1441 | config | detector | detector_log | reid | reid_log |
Tracktor | R50-FasterRCNN-FPN | R50 | half-train | half-val | N | - | 70.9 | 64.1 | 5539 | 171653 | 1619 | config | detector | detector_log | reid | reid_log |
Tracktor | R50-FasterRCNN-FPN | R50 | train | test | Y | - | 57.9 | 54.8 | 16203 | 199485 | 2299 | config | detector | detector_log | reid | reid_log |
Tracktor | R50-FasterRCNN-FPN* | R50 | train | test | Y | - | 52.6 | 52.7 | 6930 | 236680 | 1648 | - | - |
Note: When running demo_mot.py
, we suggest you use the config containing private
, since private
means the MOT method doesn't need external detections.