diff --git a/docs/en/advanced_guides/customize_dataset.md b/docs/en/advanced_guides/customize_dataset.md index fea9a2572..9798ae5ad 100644 --- a/docs/en/advanced_guides/customize_dataset.md +++ b/docs/en/advanced_guides/customize_dataset.md @@ -1,367 +1,482 @@ # Customize Datasets -## Support new data format +In this note, you will know how to train and test predefined models with customized datasets. -To support a new data format, you can either convert them to existing formats or directly convert them to the middle format. You could also choose to convert them offline (before training by a script) or online (implement a new dataset and do the conversion at training). In MMDetection3D, for the data that is inconvenient to read directly online, we recommend to convert it into KITTI format and do the conversion offline, thus you only need to modify the config's data annotation paths and classes after the conversion. -For data sharing similar format with existing datasets, like Lyft compared to nuScenes, we recommend to directly implement data converter and dataset class. During the procedure, inheritation could be taken into consideration to reduce the implementation workload. +The basic steps are as below: -### Reorganize new data formats to existing format +1. Prepare data +2. Prepare a config +3. Train, test and inference models on the customized dataset. -For data that is inconvenient to read directly online, the simplest way is to convert your dataset to existing dataset formats. +## Data Preparation -Typically we need a data converter to reorganize the raw data and convert the annotation format into KITTI style. Then a new dataset class inherited from existing ones is sometimes necessary for dealing with some specific differences between datasets. Finally, the users need to further modify the config files to use the dataset. An [example](https://mmdetection3d.readthedocs.io/en/latest/2_new_data_model.html) training predefined models on Waymo dataset by converting it into KITTI style can be taken for reference. +The ideal situation is that we can reorganize the customized raw data and convert the annotation format into KITTI style. However, considering some calibration files and 3D annotations in KITTI format are difficult to obtain for customized datasets, we introduce the basic data format in the doc. -### Reorganize new data format to middle format +### Basic Data Format -It is also fine if you do not want to convert the annotation format to existing formats. -Actually, we convert all the supported datasets into pickle files, which summarize useful information for model training and inference. +#### Point cloud Format -The annotation of a dataset is a list of dict, each dict corresponds to a frame. -A basic example (used in KITTI) is as follows. A frame consists of several keys, like `image`, `point_cloud`, `calib` and `annos`. -As long as we could directly read data according to these information, the organization of raw data could also be different from existing ones. -With this design, we provide an alternative choice for customizing datasets. +Currently, we only support '.bin' format point cloud for training and inference. Before training on your own datasets, you need to convert your point cloud files with other formats to '.bin' files. The common point cloud data formats include `.pcd` and `.las`, we list some open-source tools for reference. -```python +1. Convert pcd to bin: https://github.com/leofansq/Tools_RosBag2KITTI +2. Convert las to bin: The common conversion path is las -> pcd -> bin, and the conversion from las -> pcd can be achieved through [this tool](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor). -[ - {'image': {'image_idx': 0, 'image_path': 'training/image_2/000000.png', 'image_shape': array([ 370, 1224], dtype=int32)}, - 'point_cloud': {'num_features': 4, 'velodyne_path': 'training/velodyne/000000.bin'}, - 'calib': {'P0': array([[707.0493, 0. , 604.0814, 0. ], - [ 0. , 707.0493, 180.5066, 0. ], - [ 0. , 0. , 1. , 0. ], - [ 0. , 0. , 0. , 1. ]]), - 'P1': array([[ 707.0493, 0. , 604.0814, -379.7842], - [ 0. , 707.0493, 180.5066, 0. ], - [ 0. , 0. , 1. , 0. ], - [ 0. , 0. , 0. , 1. ]]), - 'P2': array([[ 7.070493e+02, 0.000000e+00, 6.040814e+02, 4.575831e+01], - [ 0.000000e+00, 7.070493e+02, 1.805066e+02, -3.454157e-01], - [ 0.000000e+00, 0.000000e+00, 1.000000e+00, 4.981016e-03], - [ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]), - 'P3': array([[ 7.070493e+02, 0.000000e+00, 6.040814e+02, -3.341081e+02], - [ 0.000000e+00, 7.070493e+02, 1.805066e+02, 2.330660e+00], - [ 0.000000e+00, 0.000000e+00, 1.000000e+00, 3.201153e-03], - [ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]), - 'R0_rect': array([[ 0.9999128 , 0.01009263, -0.00851193, 0. ], - [-0.01012729, 0.9999406 , -0.00403767, 0. ], - [ 0.00847068, 0.00412352, 0.9999556 , 0. ], - [ 0. , 0. , 0. , 1. ]]), - 'Tr_velo_to_cam': array([[ 0.00692796, -0.9999722 , -0.00275783, -0.02457729], - [-0.00116298, 0.00274984, -0.9999955 , -0.06127237], - [ 0.9999753 , 0.00693114, -0.0011439 , -0.3321029 ], - [ 0. , 0. , 0. , 1. ]]), - 'Tr_imu_to_velo': array([[ 9.999976e-01, 7.553071e-04, -2.035826e-03, -8.086759e-01], - [-7.854027e-04, 9.998898e-01, -1.482298e-02, 3.195559e-01], - [ 2.024406e-03, 1.482454e-02, 9.998881e-01, -7.997231e-01], - [ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]])}, - 'annos': {'name': array(['Pedestrian'], dtype='