The official implementation of YOLO-UniOW [arxiv
]
YOLO-UniOW-S/M/L has been pre-trained from scratch and evaluated on the LVIS minival
. The pre-trained weights can be downloaded from the link provided below.
Model | #Params | APmini | APr | APc | APf | FPS (V100) |
---|---|---|---|---|---|---|
YOLO-UniOW-S | 7.5M | 26.2 | 24.1 | 24.9 | 27.7 | 98.3 |
YOLO-UniOW-M | 16.2M | 31.8 | 26.0 | 30.5 | 34 | 86.2 |
YOLO-UniOW-L | 29.4M | 34.6 | 30.0 | 33.6 | 36.3 | 64.8 |
For preparing open-vocabulary and open-world data, please refer to docs/data.
Our model is built with CUDA 11.8 and PyTorch 2.1.2. To set up the environment, refer to the PyTorch official documentation for installation guidance. For detailed instructions on installing mmcv
, please see docs/installation.
conda create -n yolouniow python=3.9
conda activate yolouniow
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install -r requirements.txt
pip install -e .
For open-vocabulary model training and evaluation, please refer to run_ovod.sh
# Train Open-Vocabulary Model
./tools/dist_train.sh configs/pretrain/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp
# Evaluate Open-Vocabulary Model
./tools/dist_test.sh configs/pretrain/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.py \
pretrained/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.pth 8
For open-world model training and evaluation, please follow the steps provided in run_owod.sh
. Ensure that the model is trained before proceeding with the evaluation. We provide our fine-tuned wildcard features, object_tuned_s and object_tuned_m, obtained through steps 2 and 3, allowing for direct use.
# 1. Extract text/wildcard features from pretrained model
python tools/owod_scripts/extract_text_feats.py --config $CONFIG --ckpt $CHECKPOINT --save_path $EMBEDS_PATH
# 2. Fine-tune wildcard features
./tools/dist_train.sh $OBJ_CONFIG 8 --amp
# 3. Extract fine-tuned wildcard features
python tools/owod_scripts/extract_text_feats.py --config $OBJ_CONFIG --save_path $EMBEDS_PATH --extract_tuned
# 4. Train all owod tasks
python tools/owod_scripts/train_owod_tasks.py MOWODB $OW_CONFIG $CHECKPOINT
# 5. Evaluate all owod tasks
python tools/owod_scripts/test_owod_tasks.py MOWODB $OW_CONFIG --save
To train and evaluate on specific datasets and tasks, use the commands below:
# Train owod task
DATASET=$DATASET TASK=$TASK THRESHOLD=$THRESHOLD SAVE=$SAVE \
./tools/dist_train_owod.sh $CONFIG 8 --amp
# Evaluate owod task
DATASET=$DATASET TASK=$TASK THRESHOLD=$THRESHOLD SAVE=$SAVE \
./tools/dist_test.sh $CONFIG $CHECKPOINT 8
This project builds upon YOLO-World, YOLOv10, FOMO, and OVOW. We sincerely thank the authors for their excellent implementations!
If our code or models help your work, please cite our paper and yolov10:
@article{liu2024yolouniow,
title={YOLO-UniOW: Efficient Universal Open-World Object Detection},
author={Liu, Lihao and Feng, Juexiao and Chen, Hui and Wang, Ao and Song, Lin and Han, Jungong and Ding, Guiguang},
journal={arXiv preprint arXiv:2412.20645},
year={2024}
}
@article{wang2024yolov10,
title={YOLOv10: Real-Time End-to-End Object Detection},
author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
journal={arXiv preprint arXiv:2405.14458},
year={2024}
}