OCR PDF Survey Response Extractor

This project is designed to extract manual survey responses from scanned PDF documents. It uses image processing techniques to identify and grade multiple-choice answers from questionnaires.

Overview

The system processes scanned PDF surveys through the following steps:

PDF to image conversion
Image cropping and alignment
Detection of answer grids
Response extraction
Grading and result compilation

This automated approach allows for efficient processing of large volumes of paper-based surveys, converting them into digital data for analysis.

Setup Instructions

Clone the repository:

git clone https://github.com/Hadrien-Cornier/ocr-pdf.git
cd ocr-pdf

Create and activate a virtual environment:

For macOS and Linux:

python -m venv .venv
source .venv/bin/activate

For Windows:

python -m venv .venv
.venv\Scripts\activate

Install the requirements:
```
pip install -r requirements.txt
```

Usage

Place your PDF files in the data/input/ directory.
Run the pipeline:
```
python src/run_pipeline.py
```
This will execute the following steps:
- Crop the PDFs (output in data/cropped/)
- Align the images (output in data/aligned/)
- Perform OCR and grade extraction (output in data/output/)
Check the results in the data/output/ directory.
Debug images for each step can be found in the respective subdirectories of data/debug/.

Configuration

Adjust the settings in config/config.ini to customize the pipeline behavior.

Requirements

See requirements.txt for the list of Python packages required.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR PDF Survey Response Extractor

Overview

Setup Instructions

Usage

Configuration

Requirements

About

Releases

Packages

Languages

Hadrien-Cornier/ocr-pdf

Folders and files

Latest commit

History

Repository files navigation

OCR PDF Survey Response Extractor

Overview

Setup Instructions

Usage

Configuration

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages