(AAAI 2025) Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

by Chengyang Ye, Yunzhi Zhuge, Pingping Zhang*

Paper

Introduction

New Task: Open-Vocabulary RSI Semantic Segmantation (OVRSISS)

OVRSISS aims at segmenting arbitrary semantic class in RSI domain.
OVRSISS methods need no finetuning adapting to new classes.

New Dataset: LandDiscover50K (LD50K)

51,846 RSI images covering 40 diverse semantic classes.
Building upon established RSISS datasets, including OEM, LoveDA, DeepGlobe, SAMRS. For further details, please refer to the paper.
Elevates the performance of OVRSISS by providing large-scale, multi-domain, multi-granularity RSI images with comprehensive class annotations.

New Approach: GSNet (Generalist and Specialist Network)

Dual-Stream Image Encoder
- Generalist: CLIP backbone for open-vocabulary recognition.
- Specialist: RSI backbone for domain-specific expertise.
Query-Guided Feature Fusion
- Efficient Fusion of Generalist and Specialist Features under the guidance of text-based queries.
Residual Information Preservation Decoder
- Aggregates multi-source features for more accurate mask predictions.
- Detail refinement and backbone regularization.

Prepare datasets

Training Dataset: LD50K

Download LandDiscover50K from this hugginface repo
Merge the downloaded parts into a complete zip file:
```
cat LD50KSplit.z* > LandDiscover50K.zip
```
Extract the complete zip file to a specified directory:
```
unzip LandDiscover50K.zip -d ./dst_dir
```

Testing Datasets: FloodNet, FLAIR, FAST, Potsdam

We provide pre-processed test images and masks for users' convenience. Download them from here.
For pre-process of the testing datasets, check the paper.

Expected data strcture

$DETECTRON2_DATASETS  
├── LandDiscover50K  
│   ├── GT_ID  
│   └── TR_Image  
├── FAST  
│   └── val  
│       ├── images  
│       └── semlabels  
├── PotsdamSplit  
│   ├── ann_dir  
│   └── img_dir  
├── FLAIR  
│   └── test  
│       ├── image  
│       └── mask  
├── FloodNet  
│   └── val+test  
│       ├── img 
│       └── lbl

Installation

An example of installation is shown below:

git clone https://github.com/yecy749/GSNet.git
cd GSNet
conda create -n gsnet python=3.8
conda activate gsnet
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Linux with Python ≥ 3.8 required
Ensure that the versions of PyTorch, TorchVision, and Detectron2 are compatible. For more information, refer to pytorch.org and the Detectron2 install guide.

Training and Evaluation

Download the pretrained Specialist RSI Backbone weights.
Specify root path of datasets & RSIB weight path.

Command for Training

sh scripts/train.sh configs/vitb_384.yaml [NUM_GPUs] [TRAIN_RESULTS_DIR]

We provide pretrained weights for our model reported in the paper. Reproduction can be made through command line scripts below.

Command for Evaluation

sh scripts/eval.sh configs/vitb_384.yaml [NUM_GPUs] [EVAL_RESULTS_DIR] \
MODEL.WEIGHTS [MODEL_WEIGHTS_PATH]

An example of training and evaluation command is provided in scripts/run.sh.

Acknowledgement

We sincerely appreciate the invaluable contributions of numerous open-source projects and datasets that have supported our work, including but not limited to DETECTRON2, CAT-SEG, SAMRS, OEM, LoveDA, DeepGlobe, FloodNet, ISPRS Potsdam, FLAIR.

Citing GSNet

If you find GSNet helpful in your research, please consider citing:

@inproceedings{ye2025GSNet,
  title={Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation},
  author={Ye, Chengyang and Zhuge, Yunzhi and Zhang, Pingping}
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
configs		configs
datasets		datasets
gs_net		gs_net
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(AAAI 2025) Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

Introduction

New Task: Open-Vocabulary RSI Semantic Segmantation (OVRSISS)

New Dataset: LandDiscover50K (LD50K)

New Approach: GSNet (Generalist and Specialist Network)

Prepare datasets

Training Dataset: LD50K

Testing Datasets: FloodNet, FLAIR, FAST, Potsdam

Expected data strcture

Installation

Training and Evaluation

Acknowledgement

Citing GSNet

About

Releases

Packages

Languages

yecy749/GSNet

Folders and files

Latest commit

History

Repository files navigation

(AAAI 2025) Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

Introduction

New Task: Open-Vocabulary RSI Semantic Segmantation (OVRSISS)

New Dataset: LandDiscover50K (LD50K)

New Approach: GSNet (Generalist and Specialist Network)

Prepare datasets

Training Dataset: LD50K

Testing Datasets: FloodNet, FLAIR, FAST, Potsdam

Expected data strcture

Installation

Training and Evaluation

Acknowledgement

Citing GSNet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages