Haoyi Jiang1, Liu Liu2, Tianheng Cheng1, Xinjie Wang2,
Tianwei Lin2, Zhizhong Su2, Wenyu Liu1, Xinggang Wang1
1Huazhong University of Science & Technology, 2Horizon Robotics
- Feb 11 '25: Released the model integrated with Talk2DINO, achieving new state-of-the-art results.
- Dec 17 '24: Released our arXiv paper along with the source code.
We recommend cloning the repository using the --single-branch
option to avoid downloading unnecessary large media files for the project website from other branches:
git clone https://github.com/hustvl/GaussTR.git --single-branch
cd GaussTR
pip install -r requirements.txt
-
Prepare the nuScenes dataset following the instructions in the mmdetection3d docs.
-
Update the dataset
.pkl
files withscene_idx
to match the occupancy ground truths:python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
-
Download the occupancy ground truth data from CVPR2023-3D-Occupancy-Prediction and place it in
data/nuscenes/gts
. -
Generate features and rendering targets:
- Run
PYTHONPATH=. python tools/generate_depth.py
to generate metric depth estimations. - [For GaussTR-FeatUp Only] Navigate to the FeatUp repository and run
python tools/generate_featup.py
. - [Optional for GaussTR-FeatUp] Navigate to the Grounded SAM 2 and run
python tools/generate_grounded_sam2.py
to enable auxiliary segmentation supervision.
- Run
Download the pre-generated CLIP text embeddings from the Releases page. Alternatively, you can generate custom embeddings by referring to mmpretrain #1737 or Talk2DINO.
Tip: The default prompts have not been delicately tuned. Customizing them may yield improved results.
Model | IoU | mIoU | Checkpoint |
---|---|---|---|
GaussTR-FeatUp | 45.19 | 11.70 | checkpoint |
GaussTR-Talk2DINONew | 44.73 | 12.08 | checkpoint |
Tip: Due to the current lack of optimization for voxelization operations, evaluation during training can be time-consuming. To accelerate training, consider evaluating using the mini_train
set or reducing the evaluation frequency.
PYTHONPATH=. mim train mmdet3d [CONFIG] [-l pytorch -G [GPU_NUM]]
PYTHONPATH=. mim test mmdet3d [CONFIG] -C [CKPT_PATH] [-l pytorch -G [GPU_NUM]]
To enable visualization, run the testing with the following included in the config:
custom_hooks = [
dict(type='DumpResultHook'),
]
After testing, visualize the saved .pkl
files with:
python tools/visualize.py [PKL_PATH] [--save]
If our paper and code contribute to your research, please consider starring this repository ⭐ and citing our work:
@article{GaussTR,
title = {GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding},
author = {Haoyi Jiang and Liu Liu and Tianheng Cheng and Xinjie Wang and Tianwei Lin and Zhizhong Su and Wenyu Liu and Xinggang Wang},
year = 2024,
journal = {arXiv preprint arXiv:2412.13193}
}
This project is built upon the pioneering work of FeatUp, Talk2DINO, MaskCLIP and gsplat. We extend our gratitude to these projects for their contributions to the community.
Released under the MIT License.