GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Haoyi Jiang¹, Liu Liu², Tianheng Cheng¹, Xinjie Wang², Tianwei Lin², Zhizhong Su², Wenyu Liu¹, Xinggang Wang¹
¹Huazhong University of Science & Technology, ²Horizon Robotics

News

Feb 11 '25: Released the model integrated with Talk2DINO, achieving new state-of-the-art results.
Dec 17 '24: Released our arXiv paper along with the source code.

Setup

Installation

We recommend cloning the repository using the --single-branch option to avoid downloading unnecessary large media files for the project website from other branches:

git clone https://github.com/hustvl/GaussTR.git --single-branch
cd GaussTR
pip install -r requirements.txt

Dataset Preparation

Prepare the nuScenes dataset following the instructions in the mmdetection3d docs.

Update the dataset .pkl files with scene_idx to match the occupancy ground truths:

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

Download the occupancy ground truth data from CVPR2023-3D-Occupancy-Prediction and place it in data/nuscenes/gts.
Generate features and rendering targets:
- Run PYTHONPATH=. python tools/generate_depth.py to generate metric depth estimations.
- [For GaussTR-FeatUp Only] Navigate to the FeatUp repository and run python tools/generate_featup.py.
- [Optional for GaussTR-FeatUp] Navigate to the Grounded SAM 2 and run python tools/generate_grounded_sam2.py to enable auxiliary segmentation supervision.

CLIP Text Embeddings

Download the pre-generated CLIP text embeddings from the Releases page. Alternatively, you can generate custom embeddings by referring to mmpretrain #1737 or Talk2DINO.

Tip: The default prompts have not been delicately tuned. Customizing them may yield improved results.

Usage

Model	IoU	mIoU	Checkpoint
GaussTR-FeatUp	45.19	11.70	checkpoint
GaussTR-Talk2DINO^New	44.73	12.08	checkpoint

Training

Tip: Due to the current lack of optimization for voxelization operations, evaluation during training can be time-consuming. To accelerate training, consider evaluating using the mini_train set or reducing the evaluation frequency.

PYTHONPATH=. mim train mmdet3d [CONFIG] [-l pytorch -G [GPU_NUM]]

Testing

PYTHONPATH=. mim test mmdet3d [CONFIG] -C [CKPT_PATH] [-l pytorch -G [GPU_NUM]]

Visualization

To enable visualization, run the testing with the following included in the config:

custom_hooks = [
    dict(type='DumpResultHook'),
]

After testing, visualize the saved .pkl files with:

python tools/visualize.py [PKL_PATH] [--save]

Citation

If our paper and code contribute to your research, please consider starring this repository ⭐ and citing our work:

@article{GaussTR,
    title   = {GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding},
    author  = {Haoyi Jiang and Liu Liu and Tianheng Cheng and Xinjie Wang and Tianwei Lin and Zhizhong Su and Wenyu Liu and Xinggang Wang},
    year    = 2024,
    journal = {arXiv preprint arXiv:2412.13193}
}

Acknowledgements

This project is built upon the pioneering work of FeatUp, Talk2DINO, MaskCLIP and gsplat. We extend our gratitude to these projects for their contributions to the community.

License

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
gausstr		gausstr
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

News

Setup

Installation

Dataset Preparation

CLIP Text Embeddings

Usage

Training

Testing

Visualization

Citation

Acknowledgements

License

About

Releases 1

Packages

Languages

License

hustvl/GaussTR

Folders and files

Latest commit

History

Repository files navigation

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

News

Setup

Installation

Dataset Preparation

CLIP Text Embeddings

Usage

Training

Testing

Visualization

Citation

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages