Synthetic data project

Synthetic data for traing deep neural networks for object detection on traffic signs.

Content

Synthetic data project

Real data

For real data the Mapillary Traffic Sign Dataset is used for training and testing. Only a subset of the data is used, where the classes in templates are chosen to target the GTSDB, which then can be used for testing.

Synthetic data

The method for generating synthetic data follows https://arxiv.org/abs/1907.09679, where templates are placed randomly on backgrounds from COCO dataset.

Examples of the generated synthetic images:

Detection example

A detection example on GTSDB:

Model

A Faster R-CNN model has been trained for detection, while a ResNet has been trained for classification. For testing, detections from the detector is then given to the classifer.

A baseline trained on real images get a score of 0.84 mAP on MTSD and 0.90 mAP on GTSDB. The authors of MTSD get a similar score (but they also use more classes).

A model trained on synthetic images get a score of 0.52 mAP on MTSD and 0.81 mAP on GTSDB.

Erros

It was found that the models trained on synthetic data performs a different kind of error than the models trained on real data. It is likely that the synthetic data does not generalize enough:

Important factors for gerating synthetic data

Different experiments were done to find which factors for the synthetic data generation are most important.

The worst performing method is the one using backgrounds of images in simple colors instead of using the images from COCO.

Method	mAP MTSD	mAP GTSDB
1. no perspective constraint	0.542	0.827
2. original	0.539	0.820
3. add distractions	0.531	0.837
4. no alpha blending	0.529	0.810
5. no geometric transformations	0.508	0.680
6. no uniform noise	0.501	0.800
7. no prior brightness adjust	0.495	0.751
8. no post brightness adjust	0.454	0.688
9. simple backgrounds	0.392	0.677

Using synthetic data for pre-training

Here the synthetic images are used for pre-training models, while fine-tuning with different number of real images.

It is made sure that the classes are balanced for the models where a small number of real images are used.

It can be seen that it can be useful to use synthetic data when the number available of real images is limited, while it does not make any difference when a lot of real data is available.

Implementation details

The implementation of the models used are form PyTorch.

Hydra is used for managing configs.

Albuementations and OpenCV is used for augmentations.

TIDE is used for evaluating errors.

Requirements

Virtual environment and dependencies with conda:

conda create --name [name] python=3.9

Install requirements (in virtual environment):

pip install -r requirements.txt

Run

To genetate synthetic data

python generate_synthetic_data.py

Train detector:

python train_detection.py

Train classifier:

python train_classifier.py

Test:

python test.py

Use the config files for configuration

Project Structure

├── conf                    <- Hydra configuration files
│
├── data                    <- Project data
│
├── outputs                 <- Logs generated by Hydra and loggers
│
├── reports                 <- Reports, results, notes, pdfs, figures etc.
│
├── src                     <- Source code
│   │
│   ├── augmentations.py        <- Augmentations used for generating synthetic data
│   ├── config.py               <- @dataclasses describing the config files
│   ├── datasets.py             <- PyTorch Datasets
│   ├── engine.py               <- Training and test functions
│   ├── models.py               <- Classifiers
│   ├── transforms.py           <- Transformations for training and test
│   └── utils.py                <- Utility functions
│
├── generate_synthetic_data.py  <- For generating synthetic data
|
├── train_classifier.py     <- Run training of classifer
├── train_detection.py      <- Run training of detector
├── test.py                 <- Run testing
├── test_detection.py       <- Run testing of only detector
│
├── .env                    <- Private environment variables
├── .gitignore              <- List of files/folders ignored by git
├── requirements.txt        <- File for installing python dependencies
├── setup.cfg               <- Configuration of linters
├── pyproject.toml          <- Configuration of black
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic data project

Real data

Synthetic data

Detection example

Model

Erros

Important factors for gerating synthetic data

Using synthetic data for pre-training

Implementation details

Requirements

Run

Project Structure

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
conf		conf
data		data
reports		reports
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_synthetic_data.py		generate_synthetic_data.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test.py		test.py
test_detection.py		test_detection.py
train_classifier.py		train_classifier.py
train_detection.py		train_detection.py

License

tschiolborg/synthetic_data_project

Folders and files

Latest commit

History

Repository files navigation

Synthetic data project

Real data

Synthetic data

Detection example

Model

Erros

Important factors for gerating synthetic data

Using synthetic data for pre-training

Implementation details

Requirements

Run

Project Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages