CUDA-Based Integral Image Computation

Overview

This project implements integral image computation for grayscale images using CUDA. It leverages GPU parallel processing to achieve high performance.

The parallel computation on the GPU is based on the alternation of two kernels in the following order:

Row-wise Scan
Transpose
Row-wise Scan
Transpose

Specifically, for the scan kernel, two versions are provided: a naive implementation and an optimized one.

Integral Image

An integral image, also known as a summed-area table, is a representation that allows for fast computation of the sum of values in a rectangular subset of an image.

For example:

Formula

Given an input image $i(x, y)$, the integral image $I(x, y)$ is computed as:

$$ I(x,y) = i(x,y) + I(x-1,y)+I(x,y-1)-I(x-1,y-1) $$

Repository Structure

.
├── python/
│   ├── pycuda_test.py   # Python script using pyCUDA for invoking CUDA kernels and managing the workflow
│   └── numba_test.py    # Python script using Numba for invoking CUDA kernels and managing the workflow
├── c++/
│   ├── main.cu          # CUDA source file containing benchmarking logic
│   └── kernel.cu        # CUDA kernel definitions
└── integralimage        # Script for compiling the project and running benchmarks

Prerequisites

CUDA-capable NVIDIA GPU
CUDA Toolkit
C++ compiler
CMake
Python 3.x (for Python interface)
Python Libraries:
- numpy
- pycuda
- numba

Installation

Clone the repository:

git clone https://github.com/AlessioBugetti/integral-image-processing.git
cd integral-image-processing

Install Python dependencies:

pip install -r python/requirements.txt

Ensure the CUDA environment is set up:
- Install NVIDIA drivers.
- Install the CUDA Toolkit.
- Verify with nvcc --version.

Usage

C++

./integralimage build
./integralimage run

Python

python pycuda_test.py

or

python numba_test.py

Cuda Kernels

Included Kernels:

SumRows: Naively computes the row-wise scan (prefix sum) of a matrix
SinglePassRowWiseScan: Optimized computation of the row-wise scan (prefix sum) of a matrix
Transpose: Transposes a matrix using block-level tiling with shared memory

Performance

The implementation includes benchmarking capabilities that measure:

Sequential CPU execution time
CUDA execution time for the naive implementation of the integral image computation
CUDA execution time for the optimized implementation of the integral image computation
Speedup ratios compared to the CPU implementation for both the naive and optimized implementations
Measurements are averaged over multiple iterations to ensure reliable results.

License

This project is licensed under the GPL-3.0-only License. See the LICENSE file for more details.

Author

Alessio Bugetti - alessiobugetti98@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
c++		c++
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
integralimage		integralimage
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA-Based Integral Image Computation

Table of Contents

Overview

Integral Image

Formula

Repository Structure

Prerequisites

Installation

Usage

C++

Python

Cuda Kernels

Included Kernels:

Performance

License

Author

About

Languages

License

AlessioBugetti/integral-image-processing

Folders and files

Latest commit

History

Repository files navigation

CUDA-Based Integral Image Computation

Table of Contents

Overview

Integral Image

Formula

Repository Structure

Prerequisites

Installation

Usage

C++

Python

Cuda Kernels

Included Kernels:

Performance

License

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages