Gesture Recognition for Smart TVs

This project implements a deep learning model for gesture recognition, enabling users to control a smart TV without using a remote. The gestures are captured through a webcam and correspond to specific TV commands.

Introduction

This group project aims to develop a deep learning model capable of recognizing five distinct gestures. These gestures are then mapped to TV commands to enhance the user experience.

Problem Statement

The goal is to create a smart TV feature that recognizes five gestures performed by the user, allowing them to control the TV hands-free. The gestures and their corresponding commands are:

Thumbs up: Increase the volume
Thumbs down: Decrease the volume
Left swipe: Jump backward 10 seconds
Right swipe: Jump forward 10 seconds
Stop: Pause the movie

Project Structure

Data Preprocessing:
- Custom video frame extraction and preprocessing.
- Resizing frames and normalizing pixel values for model compatibility.
Data Generator:
- Designed to handle large datasets efficiently.
- Supports batch processing and dynamic augmentation for training.
Model Architectures:
- Conv3D: Captures spatial and temporal features from video input.
- CNN + RNN: Sequential processing of video frames using LSTMs combined with convolutional layers.
- Transfer Learning: Leverages MobileNet for feature extraction from video frames.
Evaluation and Visualization:
- Accuracy and loss metrics for each model.
- Confusion matrix and classification reports for gesture predictions.

Models

Conv3D Models

Model 1: 15 epochs, batch size 64, resolution (120x120), frames per video: 10
Model 2: 20 epochs, batch size 20, resolution (50x50), frames per video: 6
Model 3: 20 epochs, batch size 30, resolution (50x50), frames per video: 10
Model 4: 25 epochs, batch size 50, resolution (120x120), frames per video: 10
Model 5: 25 epochs, batch size 50, resolution (70x70), frames per video: 18

CNN + RNN Model

Model 6: 25 epochs, batch size 50, resolution (70x70), frames per video: 18

Transfer Learning Using MobileNet

Model 7: 15 epochs, batch size 5, resolution (120x120), frames per video: 18

Results

Each model's performance is evaluated on a validation set, and results are compared to determine the most efficient approach for real-time gesture recognition.

Conclusion

The project demonstrates that deep learning techniques can effectively recognize gestures from video data, offering an intuitive way to interact with smart TVs.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Gesture_Recognition_Project_DL.ipynb		Gesture_Recognition_Project_DL.ipynb
Gesture_Recognition_Project_Write-Up.docx		Gesture_Recognition_Project_Write-Up.docx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gesture Recognition for Smart TVs

Table of Contents

Introduction

Problem Statement

Project Structure

Models

Conv3D Models

CNN + RNN Model

Transfer Learning Using MobileNet

Results

Conclusion

About

Releases

Packages

Languages

License

MadhanMohanReddy2301/Gesture_Recognition-Transfer_learning

Folders and files

Latest commit

History

Repository files navigation

Gesture Recognition for Smart TVs

Table of Contents

Introduction

Problem Statement

Project Structure

Models

Conv3D Models

CNN + RNN Model

Transfer Learning Using MobileNet

Results

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages