Sign language gesture recognition using a reccurent neural network(RNN) with Mediapipe hand tracking.
This project is for academic purpose. Thank you for Google's Mediapipe team :)
Create training data on Desktop with input video using Multi Hand Tracking. Gesture recognition with deep learning model can be done with hand landmark features per frame with RNN training .
CUSTOMIZE:
- Use video input instead of Webcam on Desktop to train with video data
- Preprocess hand landmarks for every frame per one word and make it into one txt file
- Install Medapipe
git clone https://github.com/google/mediapipe.git
See the rest of installation documents here.
- Change end_loop_calculator.h file
cd ~/mediapipe/mediapipe/calculators/core
rm end_loop_calculator.h
to our new /end_loop_calculator.h file in the modified_mediapipe folder.
- Change demo_run_graph_main.cc file
cd ~/mediapipe/mediapipe/examples/desktop
rm demo_run_graph_main.cc
to our new demo_run_graph_main.cc file in the modified_mediapipe folder.
- Change landmarks_to_render_data_calculator.cc file
cd ~/mediapipe/mediapipe/calculators/util
rm landmarks_to_render_data_calculator.cc
to our new landmarks_to_render_data_calculator.cc file in the modified_mediapipe folder.
Make train_videos for each sign language word in one folder. Use build.by file to your mediapipe directory.
- Usage
To make mp4 file and txt file with mediapipe automatically, run
python build.py --input_data_path=[INPUT_PATH] --output_data_path=[OUTPUT_PATH]
inside mediapipe directory.
IMPORTANT: Name the folder carefully as the folder name will be the label itself for the video data. (DO NOT use space bar or '_' to your video name ex) Apple_pie (X))
For example:
input_video
├── Apple
│ ├── IMG_2733.MOV
│ ├── IMG_2734.MOV
│ ├── IMG_2735.MOV
│ └── IMG_2736.MOV
└── Happy
├── IMG_2472.MOV
├── IMG_2473.MOV
├── IMG_2474.MOV
└── IMG_2475.MOV
...
The output path is initially an empty directory, and when the build is complete, Mp4 and txt folders are extracted to your folder path.
Created folder example:
output_data
├── Absolute
│ └── Apple
│ ├── IMG_2733.txt
│ ├── IMG_2734.txt
│ ├── IMG_2735.txt
│ └── IMG_2736.txt
| ...
├── Relative
│ └── Apple
│ ├── IMG_2733.txt
│ ├── IMG_2734.txt
│ ├── IMG_2735.txt
│ └── IMG_2736.txt
│ ...
└── _Apple
├── IMG_2733.mp4
├── IMG_2734.mp4
├── IMG_2735.mp4
└── IMG_2736.mp4
Your contribution is welcome here!
- Train
python train.py --input_train_path=[INPUT_TRAIN_PATH]
INPUT_TRAIN_PATH is the path to the output folder in the previous step. (either Relative or Absolute path) The model is saved as 'model.h5' in the current directory.
Watch this video for the overall workflow. more details
future work
- import model to Xcode