siamrpn: Single Object tracking 2d (#367)

* SiamRPN learner + ROS node * CRLF -> LF * tests + custom dataset support * changelog update * Update src/opendr/perception/object_tracking_2d/siamrpn/siamrpn_learner.py Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update docs/reference/rosbridge.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update docs/reference/rosbridge.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * Update src/opendr/perception/object_tracking_2d/siamrpn/README.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * fix docs, rename ros1 node, rearrange learner + static methods * fix yolov5 download bug * Update src/opendr/perception/object_tracking_2d/siamrpn/README.md Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> * dependencies possible fix * dependencies possible fix * update tests to avoid running siamrpn test twice * Some fixes in bridge doc * Minor format fixes * Some more * More details Co-authored-by: Kostas Tsampazis <27914645+tsampazk@users.noreply.github.com> Co-authored-by: ad-daniel <daniel.dias@epfl.ch>
opendr-eu · Dec 12, 2022 · 58a9067 · 58a9067
1 parent 739c577
commit 58a9067
Show file tree

Hide file tree

Showing 33 changed files with 2,973 additions and 1,300 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ Released on December, XX, 2022.
     - Added YOLOv5 as an inference-only tool ([#360](https://github.com/opendr-eu/opendr/pull/360)).
     - Added Continual Transformer Encoders ([#317](https://github.com/opendr-eu/opendr/pull/317)).
     - Added AmbiguityMeasure utility tool ([#361](https://github.com/opendr-eu/opendr/pull/361)).
+    - Added SiamRPN 2D tracking tool ([#367](https://github.com/opendr-eu/opendr/pull/367))
   - Bug Fixes:
     - Fixed `BoundingBoxList`, `TrackingAnnotationList`, `BoundingBoxList3D` and `TrackingAnnotationList3D` confidence warnings ([#365](https://github.com/opendr-eu/opendr/pull/365)).
     - Fixed undefined `image_id` and `segmentation` for COCO `BoundingBoxList` ([#365](https://github.com/opendr-eu/opendr/pull/365)).

diff --git a/docs/reference/index.md b/docs/reference/index.md
@@ -54,6 +54,7 @@ Neither the copyright holder nor any applicable licensor will be liable for any
         - object tracking 2d:
             - [fair_mot Module](object-tracking-2d-fair-mot.md)
             - [deep_sort Module](object-tracking-2d-deep-sort.md)
+            - [siamrpn Module](object-tracking-2d-siamrpn.md)
         - object tracking 3d:
             - [ab3dmot Module](object-tracking-3d-ab3dmot.md)
         - multimodal human centric:
@@ -121,11 +122,13 @@ Neither the copyright holder nor any applicable licensor will be liable for any
             - [centernet Demo](/projects/python/perception/object_detection_2d/centernet)
             - [ssd Demo](/projects/python/perception/object_detection_2d/ssd)
             - [yolov3 Demo](/projects/python/perception/object_detection_2d/yolov3)
+              [yolov5 Demo](/projects/python/perception/object_detection_2d/yolov5)
             - [seq2seq-nms Demo](/projects/python/perception/object_detection_2d/nms/seq2seq-nms)
         - object detection 3d:
             - [voxel Demo](/projects/python/perception/object_detection_3d/demos/voxel_object_detection_3d)
         - object tracking 2d:
             - [fair_mot Demo](/projects/python/perception/object_tracking_2d/demos/fair_mot_deep_sort)
+            - [siamrpn Demo](/projects/python/perception/object_tracking_2d/demos/siamrpn)
         - panoptic segmentation:
             - [efficient_ps Demo](/projects/python/perception/panoptic_segmentation/efficient_ps)
         - semantic segmentation:

diff --git a/docs/reference/object-tracking-2d-siamrpn.md b/docs/reference/object-tracking-2d-siamrpn.md
@@ -0,0 +1,221 @@
+## SiamRPNLearner module
+
+The *SiamRPN* module contains the *SiamRPNLearner* class, which inherits from the abstract class *Learner*.
+
+### Class SiamRPNLearner
+Bases: `engine.learners.Learner`
+
+The *SiamRPNLearner* class is a wrapper of the SiamRPN detector[[1]](#siamrpn-1)
+[GluonCV implementation](https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo/siamrpn).
+It can be used to perform object tracking on videos (inference) as well as train new object tracking models.
+
+The [SiamRPNLearner](/src/opendr/perception/object_tracking_2d/siamrpn/siamrpn_learner.py) class has the following public methods:
+
+#### `SiamRPNLearner` constructor
+```python
+SiamRPNLearner(self, device, n_epochs, num_workers, warmup_epochs, lr, weight_decay, momentum, cls_weight, loc_weight, batch_size, temp_path)
+```
+
+Parameters:
+
+- **device**: *{'cuda', 'cpu'}, default='cuda'*\
+  Specifies the device to be used.
+- **n_epochs**: *int, default=50*\
+  Specifies the number of epochs to be used during training.
+- **num_workers**: *int, default=1*\
+  Specifies the number of workers to be used when loading datasets or performing evaluation.
+- **warmup_epochs**: *int, default=2*\
+  Specifies the number of epochs during which the learning rate is annealed to **lr**.
+- **lr**: *float, default=0.001*\
+  Specifies the initial learning rate to be used during training.
+- **weight_decay**: *float, default=0*\
+  Specifies the weight decay to be used during training.
+- **momentum**: *float, default=0.9*\
+  Specifies the momentum to be used for optimizer during training.
+- **cls_weight**: *float, default=1.*\
+  Specifies the classification loss multiplier to be used for optimizer during training.
+- **loc_weight**: *float, default=1.2*\
+  Specifies the localization loss multiplier to be used for optimizer during training.
+- **batch_size**: *int, default=32*\
+  Specifies the batch size to be used during training.
+- **temp_path**: *str, default=''*\
+  Specifies a path to be used for data downloading.
+
+
+#### `SiamRPNLearner.fit`
+```python
+SiamRPNLearner.fit(self, dataset, log_interval, n_gpus, verbose)
+```
+
+This method is used to train the algorithm on a `DetectionDataset` or `ExternalDataset` dataset and also performs evaluation on a validation set using the trained model.
+Returns a dictionary containing stats regarding the training process.
+
+Parameters:
+
+- **dataset**: *object*\
+  Object that holds the training dataset.
+- **log_interval**: *int, default=20*\
+  Training loss is printed in stdout after this amount of iterations.
+- **n_gpus**: *int, default=1*\
+  If CUDA is enabled, training can be performed on multiple GPUs as set by this parameter.
+- **verbose**: *bool, default=True*\
+  If True, enables maximum verbosity.
+
+#### `SiamRPNLearner.eval`
+```python
+SiamRPNLearner.eval(self, dataset)
+```
+
+Performs evaluation on a dataset. The OTB dataset is currently supported.
+
+Parameters:
+
+- **dataset**: *object*\
+  Object that holds dataset to perform evaluation on.
+  Expected type is `ExternalDataset` with `otb2015` dataset type.
+
+#### `SiamRPNLearner.infer`
+```python
+SiamRPNLearner.infer(self, img, init_box)
+```
+
+Performs inference on a single image.
+If the `init_box` is provided, the tracker is initialized.
+If not, the current position of the target is updated by running inference on the image.
+
+Parameters:
+
+- **img**: *object*\
+  Object of type engine.data.Image.
+- **init_box**: *object, default=None*\
+  Object of type engine.target.TrackingAnnotation.
+  If provided, it is used to initialize the tracker.
+
+#### `SiamRPNLearner.save`
+```python
+SiamRPNLearner.save(self, path, verbose)
+```
+
+Saves a model in OpenDR format at the specified path.
+The model name is extracted from the base folder in the specified path.
+
+Parameters:
+
+- **path**: *str*\
+  Specifies the folder where the model will be saved.
+  The model name is extracted from the base folder of this path.
+- **verbose**: *bool default=False*\
+  If True, enables maximum verbosity.
+
+#### `SiamRPNLearner.load`
+```python
+SiamRPNLearner.load(self, path, verbose)
+```
+
+Loads a model which was previously saved in OpenDR format at the specified path.
+
+Parameters:
+
+- **path**: *str*\
+  Specifies the folder where the model will be loaded from.
+- **verbose**: *bool default=False*\
+  If True, enables maximum verbosity.
+
+#### `SiamRPNLearner.download`
+```python
+SiamRPNLearner.download(self, path, mode, verbose, url, overwrite)
+```
+
+Downloads data needed for the various functions of the learner, e.g., pre-trained models as well as test data.
+
+Parameters:
+
+- **path**: *str, default=None*\
+  Specifies the folder where data will be downloaded.
+  If *None*, the *self.temp_path* directory is used instead.
+- **mode**: *{'pretrained', 'video', 'test_data', 'otb2015'}, default='pretrained'*\
+  If *'pretrained'*, downloads a pre-trained detector model.
+  If *'video'*, downloads a single video to perform inference on.
+  If *'test_data'* downloads a dummy version of the OTB dataset for testing purposes.
+  If *'otb2015'*, attempts to download the OTB dataset (100 videos).
+  This process lasts a long time.
+- **verbose**: *bool default=False*\
+  If True, enables maximum verbosity.
+- **url**: *str, default=OpenDR FTP URL*\
+  URL of the FTP server.
+- **overwrite**: *bool, default=False*\
+  If True, files will be re-downloaded if they already exists.
+  This can solve some issues with large downloads.
+
+#### Examples
+
+* **Training example using `ExternalDataset` objects**.
+  Training is supported solely via the `ExternalDataset` class.
+  See [class README](/src/opendr/perception/object_tracking_2d/siamrpn/README.md) for a list of supported datasets and presumed data directory structure.
+  Example training on COCO Detection dataset:
+  ```python
+  from opendr.engine.datasets import ExternalDataset
+  from opendr.perception.object_tracking_2d import SiamRPNLearner
+
+  dataset = ExternalDataset("/path/to/data/root", "coco")
+  learner = SiamRPNLearner(device="cuda", n_epochs=50, batch_size=32,
+                           lr=1e-3)
+  learner.fit(dataset)
+  learner.save("siamrpn_custom")
+  ```
+
+* **Inference and result drawing example on a test mp4 video using OpenCV.**
+  ```python
+  import cv2
+  from opendr.engine.target import TrackingAnnotation
+  from opendr.perception.object_tracking_2d import SiamRPNLearner
+
+  learner = SiamRPNLearner(device="cuda")
+  learner.download(".", mode="pretrained")
+  learner.load("siamrpn_opendr")
+
+  learner.download(".", mode="video")
+  cap = cv2.VideoCapture("tc_Skiing_ce.mp4")
+
+  init_bbox = TrackingAnnotation(left=598, top=312, width=75, height=200, name=0, id=0)
+
+  frame_no = 0
+  while cap.isOpened():
+      ok, frame = cap.read()
+      if not ok:
+          break
+
+      if frame_no == 0:
+          # first frame, pass init_bbox to infer function to initialize the tracker
+          pred_bbox = learner.infer(frame, init_bbox)
+      else:
+          # after the first frame only pass the image to infer
+          pred_bbox = learner.infer(frame)
+
+      frame_no += 1
+
+      cv2.rectangle(frame, (pred_bbox.left, pred_bbox.top),
+                    (pred_bbox.left + pred_bbox.width, pred_bbox.top + pred_bbox.height),
+                    (0, 255, 255), 3)
+      cv2.imshow('Tracking Result', frame)
+      cv2.waitKey(1)
+
+  cv2.destroyAllWindows()
+  ```
+
+
+#### Performance evaluation
+
+We have measured the performance on the OTB2015 dataset in terms of success and FPS on an RTX 2070.
+```
+------------------------------------------------
+|       Tracker name       | Success |   FPS   |
+------------------------------------------------
+| siamrpn_alexnet_v2_otb15 |  0.668  |  132.1  |
+------------------------------------------------
+```
+
+#### References
+<a name="siamrpn-1" href="https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf">[1]</a>
+High Performance Visual Tracking with Siamese Region Proposal Network,
+[PDF](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf).
diff --git a/docs/reference/opendr-ros-bridge.md b/docs/reference/opendr-ros-bridge.md
@@ -63,7 +63,7 @@ Converts an OpenDRPose2D message into an OpenDR Pose.
 
 Parameters:
 
-- **ros_pose**: *ros_bridge.msg.OpenDRPose2D*\
+- **ros_pose**: *opendr_bridge.msg.OpenDRPose2D*\
   ROS pose to be converted into an OpenDR Pose.
 
 #### `ROSBridge.to_ros_pose`
@@ -72,9 +72,9 @@ Parameters:
 ROSBridge.to_ros_pose(self,
                       pose)
 ```
-Converts an OpenDR Pose into a OpenDRPose2D msg that can carry the same information, i.e. a list of keypoints, 
+Converts an OpenDR Pose into a OpenDRPose2D msg that can carry the same information, i.e. a list of keypoints,
 the pose detection confidence and the pose id.
-Each keypoint is represented as an OpenDRPose2DKeypoint with x, y pixel position on input image with (0, 0) 
+Each keypoint is represented as an OpenDRPose2DKeypoint with x, y pixel position on input image with (0, 0)
 being the top-left corner.
 
 Parameters:
@@ -121,7 +121,7 @@ Converts a ROS ObjectHypothesis message into an OpenDR Category.
 
 Parameters:
 
-- **message**: *ros_bridge.msg.ObjectHypothesis*\
+- **message**: *vision_msgs.msg.ObjectHypothesis*\
   ROS ObjectHypothesis to be converted into an OpenDR Category.
 
 
@@ -136,7 +136,7 @@ Converts a ROS ObjectHypothesis message into an OpenDR Category.
 
 Parameters:
 
-- **message**: *ros_bridge.msg.ObjectHypothesis*\
+- **message**: *vision_msgs.msg.ObjectHypothesis*\
   ROS ObjectHypothesis to be converted into an OpenDR Category.
 
 #### `ROSBridge.to_ros_face`
@@ -387,11 +387,45 @@ Parameters:
 - **frame**: *int, default=-1*\
   The frame index to assign to the tracking boxes.
 
+#### `ROSBridge.to_ros_single_tracking_annotation`
+
+```python
+ROSBridge.to_ros_single_tracking_annotation(self, tracking_annotation)
+```
+
+Converts a `TrackingAnnotation` object to a `Detection2D` ROS message.
+This method is intended for single object tracking methods.
+
+Parameters:
+
+- **tracking_annotation**: *opendr.engine.target.TrackingAnnotation*\
+  The box to be converted.
+
+#### `ROSBridge.from_ros_single_tracking_annotation`
+
+```python
+ROSBridge.from_ros_single_tracking_annotation(self, ros_detection_box)
+```
+
+Converts a `Detection2D` ROS message object to a `TrackingAnnotation` object.
+This method is intended for single object tracking methods.
+
+Parameters:
+
+- **ros_detection_box**: *vision_msgs.Detection2D*\
+  The box to be converted.
+
 ## ROS message equivalence with OpenDR
-1. `sensor_msgs.msg.Img` is used as an equivelant to `engine.data.Image`
-2. `ros_bridge.msg.Pose` is used as an equivelant to `engine.target.Pose`
+1. `sensor_msgs.msg.Img` is used as an equivalent to `engine.data.Image`
+2. `opendr_bridge.msg.Pose` is used as an equivalent to `engine.target.Pose`
 3. `vision_msgs.msg.Detection2DArray` is used as an equivalent to `engine.target.BoundingBoxList`
-4. `vision_msgs.msg.Detection2D` is used as an equivalent to `engine.target.BoundingBox`
-5. `geometry_msgs.msg.Pose`  is used as an equivelant to `engine.target.Pose` for 3D poses conversion only.
-6. `vision_msgs.msg.Detection3DArray`  is used as an equivelant to `engine.target.BoundingBox3DList`.
-7. `sensor_msgs.msg.PointCloud`  is used as an equivelant to `engine.data.PointCloud`.
+4. `vision_msgs.msg.Detection2D` is used as an equivalent to `engine.target.BoundingBox` and
+   to `engine.target.TrackingAnnotation` in single object tracking
+5. `geometry_msgs.msg.Pose`  is used as an equivalent to `engine.target.Pose` for 3D poses conversion only.
+6. `vision_msgs.msg.Detection3DArray`  is used as an equivalent to `engine.target.BoundingBox3DList`.
+7. `sensor_msgs.msg.PointCloud`  is used as an equivalent to `engine.data.PointCloud`.
+
+## ROS services
+The following ROS services are implemented (`srv` folder):
+1. `opendr_bridge.OpenDRSingleObjectTracking`: can be used to initialize the tracking process of single
+   object trackers, by providing a `Detection2D` bounding box
diff --git a/projects/opendr_ws/src/opendr_bridge/CMakeLists.txt b/projects/opendr_ws/src/opendr_bridge/CMakeLists.txt
@@ -21,6 +21,12 @@ add_message_files(
     OpenDRPose2D.msg
 )
 
+ add_service_files(
+    DIRECTORY srv
+    FILES
+    OpenDRSingleObjectTracking.srv
+ )
+
 generate_messages(
     DEPENDENCIES
     std_msgs