Skip to content

nguyensinhloc/TextRecognitionFromImages

Repository files navigation

Text Detection and OCR Application

A Python-based desktop application that combines OpenCV for image processing and Tesseract OCR for text extraction, wrapped in a user-friendly Tkinter interface. This application allows users to load images, detect text regions, extract text content, and save the results.

Features

  • User-Friendly Interface: Clean and intuitive Tkinter-based GUI
  • Image Processing: Advanced image processing using OpenCV
  • Text Detection: Accurate text region detection using contour analysis
  • OCR Integration: Text extraction using Google's Tesseract OCR engine
  • File Management: Support for various image formats and text export
  • Real-Time Preview: Live display of processed images with detected text regions
  • Progress Tracking: Visual feedback during processing operations

Prerequisites

Before running the application, ensure you have the following installed:

Python Dependencies

pip install opencv-python
pip install pytesseract
pip install pillow

Tesseract OCR Installation

Windows

  1. Download the Tesseract installer from the official GitHub repository
  2. Run the installer and note the installation path
  3. Update the tesseract_cmd path in the code to match your installation:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Linux

sudo apt-get update
sudo apt-get install tesseract-ocr

macOS

brew install tesseract

Installation

  1. Clone the repository:
git clone https://github.com/nguyensinhloc/TextRecognitionFromImages.git
cd TextRecognitionFromImages
  1. Install required dependencies:
pip install -r requirements.txt
  1. Update Tesseract path in main.py if necessary

  2. Run the application:

python main.py

Usage

  1. Loading Images

    • Click the "Load Image" button
    • Select an image file (supported formats: PNG, JPG, JPEG, BMP, TIFF)
    • The image will be displayed in the left panel
  2. Detecting Text

    • Click the "Detect Text" button
    • The application will:
      • Process the image using OpenCV
      • Detect text regions
      • Highlight detected regions in green
      • Extract text using OCR
      • Display extracted text in the right panel
  3. Saving Results

    • Click the "Save Text" button
    • Choose a location and filename
    • The extracted text will be saved as a UTF-8 encoded text file

Project Structure

TextRecognitionFromImages/
│
├── main.py    # Main application file
├── requirements.txt         # Project dependencies
├── README.md               # Project documentation

How It Works

  1. Image Processing Pipeline

    • Convert image to grayscale
    • Apply binary thresholding
    • Perform dilation for better text region detection
    • Find contours in the processed image
  2. Text Detection

    • Filter contours based on size
    • Draw rectangles around detected text regions
    • Extract region of interest (ROI) for each detection
  3. OCR Processing

    • Process each ROI using Tesseract OCR
    • Combine extracted text
    • Display results in the application

Troubleshooting

Common Issues

  1. Tesseract Not Found

    • Verify Tesseract is installed correctly
    • Check the path in the code matches your installation
    • Ensure environment variables are set correctly
  2. Poor Text Detection

    • Ensure image is clear and well-lit
    • Try adjusting the threshold values in the code
    • Consider preprocessing images before loading
  3. Performance Issues

    • Reduce image size before processing
    • Close other resource-intensive applications
    • Check system meets minimum requirements

Contributing

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/improvement)
  3. Make changes
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature/improvement)
  6. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • OpenCV for image processing capabilities
  • Google's Tesseract OCR engine
  • Python Tkinter for the GUI framework
  • All contributors and testers

Version History

  • 1.0.0 (2024-11-22)
    • Initial release
    • Basic text detection and OCR functionality
    • Tkinter GUI implementation

Contact

Nguyễn Sinh Lộc

Project Link: https://github.com/nguyensinhloc/TextRecognitionFromImages