Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.08 KB

README.md

File metadata and controls

22 lines (16 loc) · 1.08 KB

Shotor

Word Level OCR Dataset for Persian Language

Shotor (means camel in Persian) is a free synthetic dataset for Word Level OCR.

Sample Images

The current version contains 120000 grayscale 50*100 images and corresponding words. The words contain only alphabet.
Note: To train a robust model, apply augmentations like scaling, translation, additive noise and ... on the images.
To see an example of using the Shotor dataset see this notebook:
A simple word level OCR for Persian Language using Pytorch and OpenCV

I used these resourses to create word lists:

The images have been generated using multiple fonts:

Created by: Amirabbas Asadi (amir137825@gmail.com)