Skip to content

Implements sequential and parallel integral image computation in C++ and Python, utilizing CUDA for parallel computation on GPU

License

Notifications You must be signed in to change notification settings

AlessioBugetti/integral-image-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA-Based Integral Image Computation

Table of Contents

Overview

This project implements integral image computation for grayscale images using CUDA. It leverages GPU parallel processing to achieve high performance.

The parallel computation on the GPU is based on the alternation of two kernels in the following order:

  1. Row-wise Scan
  2. Transpose
  3. Row-wise Scan
  4. Transpose

Specifically, for the scan kernel, two versions are provided: a naive implementation and an optimized one.

Integral Image

An integral image, also known as a summed-area table, is a representation that allows for fast computation of the sum of values in a rectangular subset of an image.

For example:

Integral Image Example

Formula

Given an input image $i(x, y)$, the integral image $I(x, y)$ is computed as:

$$ I(x,y) = i(x,y) + I(x-1,y)+I(x,y-1)-I(x-1,y-1) $$

Repository Structure

.
├── python/
│   ├── pycuda_test.py   # Python script using pyCUDA for invoking CUDA kernels and managing the workflow
│   └── numba_test.py    # Python script using Numba for invoking CUDA kernels and managing the workflow
├── c++/
│   ├── main.cu          # CUDA source file containing benchmarking logic
│   └── kernel.cu        # CUDA kernel definitions
└── integralimage        # Script for compiling the project and running benchmarks

Prerequisites

  • CUDA-capable NVIDIA GPU
  • CUDA Toolkit
  • C++ compiler
  • CMake
  • Python 3.x (for Python interface)
  • Python Libraries:
    • numpy
    • pycuda
    • numba

Installation

  1. Clone the repository:
git clone https://github.com/AlessioBugetti/integral-image-processing.git
cd integral-image-processing
  1. Install Python dependencies:
pip install -r python/requirements.txt
  1. Ensure the CUDA environment is set up:
    • Install NVIDIA drivers.
    • Install the CUDA Toolkit.
    • Verify with nvcc --version.

Usage

C++

./integralimage build
./integralimage run

Python

python pycuda_test.py

or

python numba_test.py

Cuda Kernels

Included Kernels:

  • SumRows: Naively computes the row-wise scan (prefix sum) of a matrix
  • SinglePassRowWiseScan: Optimized computation of the row-wise scan (prefix sum) of a matrix
  • Transpose: Transposes a matrix using block-level tiling with shared memory

Performance

The implementation includes benchmarking capabilities that measure:

  • Sequential CPU execution time
  • CUDA execution time for the naive implementation of the integral image computation
  • CUDA execution time for the optimized implementation of the integral image computation
  • Speedup ratios compared to the CPU implementation for both the naive and optimized implementations
  • Measurements are averaged over multiple iterations to ensure reliable results.

License

This project is licensed under the GPL-3.0-only License. See the LICENSE file for more details.

Author

Alessio Bugetti - alessiobugetti98@gmail.com

About

Implements sequential and parallel integral image computation in C++ and Python, utilizing CUDA for parallel computation on GPU

Topics

Resources

License

Stars

Watchers

Forks