Multi-modal ROS-based people and group detection & tracking framework for mobile robots developed within the context of the EU FP7 project SPENCER.
- Multi-modal detection: Multiple RGB-D & 2D laser detectors in one common framework.
- People tracking: Efficient tracker based upon nearest-neighbor data association.
- Social relations: Estimate spatial relations between people via coherent motion indicators.
- Group tracking: Detection and tracking of groups of people based upon their social relations.
- Robustness: Various extensions such as IMM, track initiation logic and high-recall detector input make the people tracker work relatively robustly even in very dynamic environments.
- Real-time: Runs at 20-30 Hz on gaming laptops, tracker itself requires only ~10% of 1 CPU core.
- Extensible and reusable: Well-structured ROS message types and clearly defined interfaces make it easy to integrate custom detection and tracking components.
- Powerful visualization: A series of reusable RViz plugins that can be configured via mouse click, plus scripts for generating animated (2D) SVG files.
- Evaluation tools: Metrics (CLEAR-MOT, OSPA) for evaluation of tracking performance.
- ROS integration: All components are fully integrated with ROS and written in C++ or Python. No Matlab required.
The aim of the EU FP7 research project SPENCER was to develop algorithms for service robots that can guide groups of people through highly dynamic and crowded pedestrian environments, such as airports or shopping malls, while behaving in a socially compliant manner by e.g. not crossing in between families or couples. Exemplary situations that such a robot could encounter are visualized in below image on the right. To this end, robust and computationally efficient components for the perception of humans in the robot's surroundings needed to be developed.
The following figure shows the real-time people and group detection and tracking pipeline developed in the context of the SPENCER project:
The entire communication between different stages of our pipeline occurs via ROS messages which encourage reuse of our components in custom setups. The modular architecture allows for easy interchangeability of individual components in all stages of the pipeline.
We provide a set of reusable ROS message type definitions, which we have successfully applied across various people detection and tracking scenarios, over different sensor modalities and tracking approaches. Most relevant messages can be found inside the spencer_tracking_msgs package.
We highly encourage reuse of these messages to benefit from our rich infrastructure of detection, tracking, filtering and visualization components! Existing detection and tracking algorithms can often easily be integrated by publishing additional messages in our format, or by writing a simple C++ or Python node that converts between message formats.
We have integrated the following person detection modules:
- A reimplementation of a boosted 2D laser segment classifier, based upon the method by Arras et al. [3]
- An RGB-D upper-body detector described more closely in [2], which slides a normalized depth template over ROIs in the depth image
- A monocular-vision full-body HOG detector (groundHOG) [2], which based upon a given ground plane estimate determines the image corridor in which pedestrians can be expected. This detector is GPU-accelerated using CUDA. The contained cudaHOG library requires manual compilation and a recent CUDA SDK as well as an nVidia graphics card.
- An RGB-D detector from PCL, which extracts candidate ROIs on a groundplane and then applies a linear HOG classifier [4]
Further external detectors which output geometry_msgs/PoseArray
or people_msgs/PositionMeasurementArray
messages can easily be integrated into our framework using the scripts from this package. Examples of such detectors include:
- The laser-based leg detector from wg-perception, which might work better than our own laser detector if the sensor is located very close to the ground. See our wrapper package and
leg_detectors.launch
(replaceslaser_detectors.launch
).
For detection-to-detection fusion, we have implemented a series of nodelets which can be used to flexibly compose a fusion pipeline by means of roslaunch XML files. Details can be found in the spencer_detected_person_association package. The following figure shows an example configuration which was used during experiments in SPENCER:
In case of detection-to-track fusion (currently not implemented), it is still advisable to publish a CompositeDetectedPerson message (via CompositeDetectedPersons) for each set of detections associated with a track, such that later on it is possible to go back to the original detections from a track, and lookup associated image bounding boxes etc. via the associated detection_id.
For person and group tracking, we currently provide exemplary code based upon a nearest-neighbor standard filter data association, which is robust enough in most use cases (especially if multi-modal detectors are being used). The people tracker has been enhanced with a track initiation logic and 4 different IMM-based motion models (constant velocity with low process noise, high process noise, coordinated turn and Brownian motion) to make tracking more robust.
The group tracker relies on social/spatial relations determined via the same coherent motion indicator features as described in [1].
Internally, we have already integrated more advanced methods, including a track-oriented multi-hypothesis person tracker [2], and a hypothesis-oriented multi-model multi-hypothesis person and group tracker [1]. These components use exactly the same ROS message definitions, however, they are not yet publicly available. The components available here were originally implemented as baseline methods for comparison.
The spencer_tracking_utils package contains a number of standalone ROS nodes that can filter an incoming set of TrackedPerson messages based upon different criteria, e.g. distance to the sensor/robot, visually confirmed tracks only, etc.
In spencer_tracking_metrics, we have wrapped publicly available implementations of different tracking metrics, such as CLEAR-MOT and OSPA, such that they are compatible with our message definitions. These are useful for evaluating tracking performance for a given groundtruth.
The srl_tracking_logfile_import package provides a Python script for importing old 2D laserscan logfiles in CARMEN format that have been annotated with groundtruth person tracks, such as these datasets.
The srl_tracking_exporter package contains a useful Python script for rendering track trajectories, detections and robot odometry from a 2D top-down perspective as scalable vector graphics (SVGs). These can optionally be animated to visualize the evolution of one or multiple tracks over time.
One major highlight of our framework is a reusable and highly configurable set of custom RViz plugins for the visualization of:
- Detected persons
- Tracked persons (including occlusion state, associated detection ID, and covariance ellipses)
- Social relations
- Tracked groups
As an example, some features of the tracked persons display are:
- Different visual styles: 3D bounding box, cylinder, animated human mesh
- Coloring: 6 different color palettes
- Display of velocity arrows
- Visualization of the 99% covariance ellipse
- Display of track IDs, status (matched, occluded), associated detection IDs
- Configurable reduction of opacity when a track is occluded
- Track history (trajectory) display as dots or lines
- Configurable font sizes and line widths
All of the following screenshots have been generated using these plugins.
The following screenshots show our system in action, while playing back recorded data from a crowded airport environment:
Multi-modal people detection. In orange: 2D laser [3], cyan: upper-body RGB-D [2], yellow: monocular vision HOG [2], grey: fused detections (when using detection-to-detection fusion).
Videos of the people detection and tracking system in action can be found on the SPENCER YouTube Channel:
- Real-Time Multi-Modal People Tracking in a Crowded Airport Environment (RGB-D and 2D laser)
- Single Person Guidance Scenario Prototype (2D laser only)
- Group Guidance Scenario Prototype (2D laser only)
On the SPENCER robot platform, which is equipped with front and rear RGB-D sensors (Asus Xtion Pro Live) and two SICK LMS500 laser scanners, we distributed the people and group detection & tracking system over two high-end gaming laptops (Intel Core i7-4700MQ, nVidia GeForce 765M). The detectors for the frontal sensors were executed on one laptop along with the detection-fusion pipeline. The detectors for the rear-facing sensors and the people and group tracking modules were executed on the second laptop. Both laptops were connected with each other and the rest of the platform via gigabit ethernet.
With this configuration, the components run in real-time at 20-25 Hz (with visualization off-loaded to a separate computer), even in crowded environments where more than 30 persons are concurrently visible.
The people and group detection and tracking framework has been tested on Ubuntu 12.04 using ROS Hydro, as well as Ubuntu 14.04 using ROS Indigo. For more information on the Robot Operating System (ROS), please refer to ros.org.
NOTE: The entire framework has only been tested on 64-bit systems. On 32-bit systems, you will encounter Eigen-related alignment issues (failed assertions). See issue #1
We recommend installation of ROS and the required depencencies of our components via:
sudo apt-get install ros-hydro-desktop-full
sudo apt-get install libeigen3-dev libsvm-dev python-numpy python-scipy ros-hydro-openni-launch ros-hydro-openni2-launch ros-hydro-cmake-modules ros-hydro-eigen-conversions
sudo apt-get install ros-indigo-desktop-full
sudo apt-get install libeigen3-dev libsvm-dev python-numpy python-scipy ros-indigo-openni-launch ros-indigo-openni2-launch ros-indigo-cmake-modules ros-indigo-eigen-conversions
As we currently do not yet provide any pre-built Debian packages, we suggest to create a new catkin workspace for our framework, and then clone the content of this repository into the src
folder of this new workspace. Then, build the workspace using the normal methods (catkin_make / catkin build).
The cudaHOG library used by the groundHOG detector requires an nVidia graphics card and an installed CUDA SDK (recommended version: 6.5). As installing CUDA (especially on laptops with Optimus/Bumblebee) and compiling the library is not straightforward, detailled installation instructions are provided here. Once these instructions have been followed, the rwth_ground_hog
package needs to be rebuilt using catkin. If no CUDA SDK is installed, the ROS package will still compile, but it will not provide any functionality.
The following three tutorials help you to easily get started using our framework.
This is the easiest way to get started using just a single RGB-D sensor connected locally to your computer. Place your Asus Xtion Pro Live or Kinect v1 sensor horizontally on a flat surface, and connect it to your computer (or play the example bagfile linked in a section further below). Then run the following launch file from your people tracking workspace (make sure that you have sourced it, e.g. source devel/setup.bash
):
roslaunch spencer_people_tracking_launch tracking_single_rgbd_sensor.launch height_above_ground:=1.6
This will do the following:
- Start the OpenNi2 drivers and publish RGB-D point clouds in the
/spencer/sensors/rgbd_front_top/
camera namespace - Run an upper-body RGB-D and groundHOG RGB detector, assuming a horizontal ground plane at 1.6 meters below the sensor. Other heights may work as well, but the detector has been trained at approximately this height.
- Run a simple detection-to-detection fusion pipeline
- Run the
srl_nearest_neighbor_tracker
, which will subscribe to/spencer/perception/detected_persons
and publish tracks at/spencer/perception/tracked_persons
- Run RViz with a predefined configuration, which shows the point cloud, detected and tracked persons (using our custom RViz plugins)
If this doesn't work, first check if the point cloud is displayed properly in RViz. If not, there is probably a problem with your RGB-D sensor (USB or OpenNi 2 issues).
To try out a sensor configuration similar to the SPENCER robot platform, run:
roslaunch spencer_people_tracking_launch tracking_on_robot.launch
This assumes the RGB-D sensors mounted horizontally at about 1.6m above ground, and sensor data to be published on the following topics:
/spencer/sensors/laser_front/echo0 [sensor_msgs/LaserScan]
/spencer/sensors/laser_rear/echo0 [sensor_msgs/LaserScan]
/spencer/sensors/rgbd_front_top/{rgb/image_raw,depth/image_rect} [sensor_msgs/Image]
/spencer/sensors/rgbd_front_top/{rgb/camera_info} [sensor_msgs/CameraInfo]
/spencer/sensors/rgbd_rear_top/{rgb/image_raw,depth/image_rect} [sensor_msgs/Image]
/spencer/sensors/rgbd_rear_top/{rgb/camera_info} [sensor_msgs/CameraInfo]
The launch file starts a pipeline similar to that from tutorial 1 (above), but includes a second set of RGB-D detectors for the rear sensor, as well as person detectors for the two 2D laser scanners. Sensor drivers which publish the RGB-D and laser data listed above are not started automatically by this launch file. Also, you manually have to start Rviz.
Note that the fusion pipeline reconfigures automatically if only a subset of the person detectors is running. If e.g. you don't have a rear RGB-D sensor, just comment out the line which includes rear_rgbd_detectors.launch
in tracking_on_robot.launch
.
- Start your own launch files for starting person detectors, or use a combination of the launch files we provide in
spencer_people_tracking_launch/launch/detectors
. You may have to remap input and output topics as needed. - Create a copy of
detection_to_detection_fusion_pipeline.launch
and its children, such asfuse_lasers_and_rgbd.launch
, inspencer_detected_person_association
. Based upon the provided example, create your own pipeline that step-by-step fuses detections from different detectors. For more information, see the corresponding package. - Create a copy of the
freiburg_people_tracking.launch
file inspencer_people_tracking_launch
. Adjust it to refer to your own fusion launch file created in step 2. - Start your copy of
freiburg_people_tracking.launch
. - If needed, start group tracking via
roslaunch spencer_people_tracking_launch group_tracking.launch
.
A short exemplary bagfile with 2D laser and RGB-D sensor data to test our framework will be linked here in the future.
In case you just want to test one of the detectors, we will also provide a launch file that remaps the bagfile topics to the ones expected by the detector launch files (e.g. laser
instead of /spencer/sensors/laser_front/echo0
).
The software in this repository is maintained by:
- Timm Linder, Social Robotics Lab, Albert-Ludwigs-Universität Freiburg
- Stefan Breuers, Computer Vision Group, RWTH Aachen University
Credits of the different ROS packages go to the particular authors listed in the respective README.md
and package.xml
files.
This work has been supported by the EC under contract number FP7-ICT-600877 (SPENCER). If you use the software contained in this repository for your research, please cite the following publication:
On Multi-Modal People Tracking from Mobile Platforms in Very Crowded and Dynamic Environments
Linder, T., Breuers, S., Leibe, B., Arras, K.O.
IEEE International Conference on Robotics and Automation (ICRA) 2016
also optionally:
People Detection, Tracking and Visualization using ROS on a Mobile Service Robot
Linder, T. and Arras, K.O.
Robot Operating System (ROS): The Complete Reference (Vol. 1).
Springer Studies in Systems, Decision and Control, 2016
Most of the software in this repository is released under a BSD license. For details, however, please check the individual ROS packages.
[1] Linder T. and Arras K.O. Multi-Model Hypothesis Tracking of Groups of People in RGB-D Data. IEEE Int. Conference on Information Fusion (FUSION'14), Salamanca, Spain, 2014.
[2] Jafari O. Hosseini and Mitzel D. and Leibe B.. Real-Time RGB-D based People Detection and Tracking for Mobile Robots and Head-Worn Cameras. IEEE International Conference on Robotics and Automation (ICRA'14), 2014.
[3] Arras K.O. and Martinez Mozos O. and Burgard W.. Using Boosted Features for the Detection of People in 2D Range Data. IEEE International Conference on Robotics and Automation (ICRA'07), Rome, Italy, 2007.
[4] Munaro M. and Menegatti E. Fast RGB-D people tracking for service robots. In Autonomous Robots, Volume 37 Issue 3, pp. 227-242, Springer, 2014.