PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction
Sicheng Zuo*, Wenzhao Zheng*
$\dagger$ , Yuanhui Huang, Jie Zhou, Jiwen Lu$\ddagger$
* Equal contribution
- PointOcc enables the use of 2D image backbones for efficient point-based 3D semantic occupancy prediction.
- The Lidar-only PointOcc even outperforms Lidar & Camera multi-modal methods by a large margin.
Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view (TPV) to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct a tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark.
- Create a conda environment and activate it.
conda create -n uniauto python=3.8
conda activate uniauto
- Install PyTorch and torchvision following the official instructions.
pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
- Follow instructions in https://mmdetection3d.readthedocs.io/en/latest/get_started.html#installation to install mmcv-full, mmdet, mmsegmentation and mmdet3d
pip install -U openmim
mim install mmengine
mim install 'mmcv>=2.0.0rc4'
mim install 'mmdet>=3.0.0'
mim install mmsegmentation
mim install "mmdet3d>=1.1.0rc0"
- install other packages
pip install timm==0.4.12
pip install torch_scatter
pip install spconv-cu113
#pip install numpy==1.19.5
#pip install numba==0.48.0
pip install scikit-image==0.19.3
pip install pandas==1.4.4
- Download pretrain weights from here and put it in pretrain/
- Follow detaild instructions to prepare nuScenes-Occupancy.
- Folder structure:
PointOcc
├── pretrain/
│ ├── swin_tiny_patch4_window7_224.pth/
├── data/
│ ├── nuscenes/
│ │ ├── maps/
│ │ ├── samples/
│ │ ├── sweeps/
│ │ ├── lidarseg/
│ │ ├── v1.0-test/
│ │ ├── v1.0-trainval/
│ │ ├── nuscenes_occ_infos_train.pkl/
│ │ ├── nuscenes_occ_infos_val.pkl/
│ ├── nuScenes-Occupancy/
│ │ ├── scene_0ac05652a4c44374998be876ba5cd6fd/
│ │ ├── ...
-
Train PointOcc for LiDAR segmentation task
python train_seg.py --py-config config/pointtpv_nusc_lidarseg.py --work-dir work_dir/nusc_lidarseg/pointtpv
-
Train PointOcc for 3D semantic occupancy prediction task
python train_occ.py --py-config config/pointtpv_nusc_occ.py --work-dir work_dir/nusc_occ/pointtpv
-
Evaluate PointOcc for LiDAR segmentation task
python eval_seg.py --py-config config/pointtpv_nusc_lidarseg.py --ckpt-path xxx --log-file work_dir/nusc_lidarseg/pointtpv/eval.log
-
Evaluate PointOcc for 3D semantic occupancy prediction task
python eval_occ.py --py-config config/pointtpv_nusc_occ.py --ckpt-path xxx --log-file work_dir/nusc_occ/pointtpv/eval.log
Our code mainly derives from TPVFormer and is also based on Cylinder3D. Many thanks to them!
If you find this project helpful, please consider citing the following paper:
@article{pointocc,
title={PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction},
author={Zuo, Sicheng and Zheng, Wenzhao and Huang, Yuanhui and Zhou, Jie and Lu, Jiwen},
journal={arXiv preprint arXiv:2308.16896},
year={2023}
}