From 737c29f2bf62a46095e121f159eacfa19287f4b6 Mon Sep 17 00:00:00 2001 From: Sun Jiahao <72679458+sunjiahao1999@users.noreply.github.com> Date: Fri, 12 May 2023 14:23:38 +0800 Subject: [PATCH] [Docs] Add readme of TPVFormer (#2517) * fix polarmix UT * init tpvformer * add nus seg * add nus seg * test done * Delete change_key.py * Delete test_dcn.py * remove seg eval * fix encoder * init train * train ready * remove asynctest * change test.yml * pr_stage_test.yml & merge_stage_test.yml * pip install wheel * pip install wheel all * check type hint * check comments * remove Photo aug * fix p2v * fix docsting & fix config filepath * add readme * rename configs * fix log path --- projects/CenterFormer/README.md | 2 +- projects/TPVFormer/README.md | 60 +++++++++++++++++++ .../tpvformer_8xb1-2x_nus-seg.py | 0 3 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 projects/TPVFormer/README.md rename projects/TPVFormer/{config => configs}/tpvformer_8xb1-2x_nus-seg.py (100%) diff --git a/projects/CenterFormer/README.md b/projects/CenterFormer/README.md index 9d81f1b87..f84556b69 100644 --- a/projects/CenterFormer/README.md +++ b/projects/CenterFormer/README.md @@ -57,7 +57,7 @@ python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=${N In MMDetection3D's root directory, run the following command to test the model: ```bash -python tools/train.py projects/CenterFormer/configs/centerformer_voxel01_second-atten_secfpn-atten_4xb4-cyclic-20e_waymoD5-3d-3class.py ${CHECKPOINT_PATH} +python tools/test.py projects/CenterFormer/configs/centerformer_voxel01_second-atten_secfpn-atten_4xb4-cyclic-20e_waymoD5-3d-3class.py ${CHECKPOINT_PATH} ``` ## Results and models diff --git a/projects/TPVFormer/README.md b/projects/TPVFormer/README.md new file mode 100644 index 000000000..9a0681bf8 --- /dev/null +++ b/projects/TPVFormer/README.md @@ -0,0 +1,60 @@ +# Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction + +> [Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction](https://arxiv.org/abs/2302.07817) + + + +## Abstract + +Modern methods for vision-centric autonomous driving perception widely adopt the bird's-eye-view (BEV) representation to describe a 3D scene. Despite its better efficiency than voxel representation, it has difficulty describing the fine-grained 3D structure of a scene with a single plane. To address this, we propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes. We model each point in the 3D space by summing its projected features on the three planes. To lift image features to the 3D TPV space, we further propose a transformer-based TPV encoder (TPVFormer) to obtain the TPV features effectively. We employ the attention mechanism to aggregate the image features corresponding to each query in each TPV plane. Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels. We demonstrate for the first time that using only camera inputs can achieve comparable performance with LiDAR-based methods on the LiDAR segmentation task on nuScenes. Code: https://github.com/wzzheng/TPVFormer. + +