This is the official PyTorch implementation of the StrDA paper, which was accepted at the main conference of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025.
In this paper, we propose the Stratified Domain Adaptation (StrDA) approach, a progressive self-training framework for scene text recognition. By leveraging the gradual escalation of the domain gap with the Harmonic Domain Gap Estimator (
- Keywords: scene text recognition (STR), unsupervised domain adaptation (UDA), self-training (ST), optical character recognition (OCR)
2025/03/06
: 📜 We have uploaded the instructions for running the code.2025/03/03
: 💻 We have released the implementation of StrDA for TRBA and CRNN.2025/02/28
: 🗣️ We attended the conference, you can view the poster and slides here.2025/08/30
: 🔥 Our paper has been accepted to WACV'25 (Algorithms Track).
-
python>=3.8.16
-
Install PyTorch-cuda>=11.3 following official instruction:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
-
Install the necessary dependencies by running (
!pip install -r requirements.txt
):pip install opencv-python==4.4.0.46 Pillow==7.2.0 opencv-python-headless==4.5.1.48 lmdb tqdm nltk six pyyaml
- You can also create the environment using
docker build -t StrDA .
Thanks to ku21fan/STR-Fewer-Labels, baudm/parseq, and Mountchicken/Union14M for compiling and organizing the data. I highly recommend that you follow their guidelines to download the datasets and review the license of each dataset.
Please pay attention to the warnings when running the code (e.g., select_data for target domain data, checkpoint of HDGE, and trained weights of DD).
- First, you need a source-trained STR model. If you don’t have one, you can use
supervised_learning.py
to train an STR model with source domain data (synthetic). - Next, you need to filter the data, removing samples that are too long (width > 25 times height) and save them to
select_data.npy
(to be updated later). Since the model only processes a maximum of 25 characters per word, these long samples could be harmful during pseudo-labeling. - Then, you will run Stage 1 using one of the two methods. The files containing data information for each subset will be saved in
stratify/{args.method}/
as.npy
files. Please check them carefully! - Finally, run Stage 2 to perform adaptation on the target domain data to boost model performance. Then, test the results using a wide range of benchmarks.
Note: The target domain data must remain unchanged throughout the experiment.
CUDA_VISIBLE_DEVICES=0 python supervised_learning.py --model TRBA --aug
There are 2 main methods with many settings:
-
Harmonic Domain Gap Estimator (
$\mathrm{HDGE}$ )CUDA_VISIBLE_DEVICES=0 python stage1_HDGE.py --select_data select_data.npy --num_subsets 5 --beta 0.7 --train
-
Domain Discriminator (
$\mathrm{DD}$ )CUDA_VISIBLE_DEVICES=0 python stage1_DD.py --select_data select_data.npy --num_subsets 5 --discriminator CRNN --train --aug
Note: For both methods, you only need to activate --train
to train the model the first time. After that, you can stratify the data without retraining.
CUDA_VISIBLE_DEVICES=0 python stage2_StrDA.py --saved_model trained_model/TRBA.pth --model TRBA --num_subsets 5 --method HDGE --beta 0.7 --aug
Note: If the method is HDGE, you must enter --beta
. If the method is DD, you must select a --discriminator
. Example:
CUDA_VISIBLE_DEVICES=0 python stage2_StrDA.py --saved_model trained_model/CRNN.pth --model CRNN --num_subsets 5 --method DD --discriminator CRNN --aug
CUDA_VISIBLE_DEVICES=0 python test.py --saved_model trained_model/TRBA.pth --model TRBA
Broader insight: You can try this method with different STR models, on various source-target domain pairs (e.g., synthetic-handwritten/art text) and even more complex domain gap problems like medical image segmentation. Additionally, you can replace self-training with more advanced UDA techniques.
If you find our work useful for your research, please cite it and give us a star⭐!
@InProceedings{Le_2025_WACV,
author = {Le, Kha Nhat and Nguyen, Hoang-Tuan and Tran, Hung Tien and Ngo, Thanh Duc},
title = {Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {8972-8982}
}
This code is based on STR-Fewer-Labels by Jeonghun Baek and cycleGAN-PyTorch by Arnab Mondal . Thanks for your contributions!