This repository contains the implementation and evaluation of various motion detection algorithms, including traditional background subtraction techniques and state-of-the-art deep learning models. The project assesses the strengths and limitations of these approaches on benchmark datasets for motion detection tasks. For more details, refer to the full report included in this repository.
The following benchmark datasets were used for evaluation:
These datasets include video sequences with varying environments, lighting conditions, and levels of clutter. Ground truth annotations are provided to label regions as foreground (moving objects) or background. For more information about the datasets and their characteristics, refer to the report.
The following traditional algorithms for motion detection are implemented:
- Frame Differencing
- Reference Frame Differencing (RFD)
- Adjacent Frame Differencing (AFD)
- Median Frame Differencing (MFD)
- Median Frame Differencing with Morphology (MFD-M)
- Mixture of Gaussians (MOG)
These approaches are detailed in the report, which includes descriptions of the algorithms and their implementations.
A transformer-based architecture for scene change detection. It uses a CNN backbone, Siamese Vision Transformer (SViT), and a prediction head to generate binary change maps from bi-temporal images.
For an in-depth explanation of this model, see the report.
A U-Net-like architecture for motion detection, leveraging ResNet-18 as the backbone. It uses encoder-decoder modules with skip connections to accurately segment moving objects.
For more details about MU-Net1, refer to the report.
Sample outputs from traditional and deep learning methods were qualitatively evaluated. The comparison below highlights their performance on challenging scenarios.
- Traditional Methods:
- RFD and AFD are computationally efficient but struggle with dynamic backgrounds and noise.
- MFD improves robustness using temporal information, while MFD-M refines masks using morphological operations.
- MOG adapts well to gradual scene changes but is computationally intensive.
- Deep Learning Models:
- TransCD and MU-Net1 outperform traditional methods in complex scenes with occlusions and varying lighting conditions.
- However, these models require significant computational resources for real-time deployment.
For additional discussion and evaluation procedures, consult the report.