This project focuses on developing a model for American Sign Language (ASL) recognition using LSTM layers and a cosine similarity loss function. The WLASL-2000 Resized dataset consists of ASL gestures captured in videos, and the goal is to accurately predict the corresponding English word or gesture.
The project is organized into the following main directories:
-
dataset: Contains raw video data and the preprocessed dataset.
videos/
: Raw video files.WLASL_v0.3.json
: JSON file containing dataset information.new_preprocessed-data/
: Preprocessed landmark data split into train, validation, and test sets.missing.txt
: Lists the video files that are missing from the dataset.
-
model_checkpoints: Directory to store model checkpoints during training.
-
model: Directory to save the trained model.
-
landmarks: Directory for storing processed landmarks.
-
CSE 4554 Project.ipynb
: Notebook with the project implementation. -
.gitignore
: File to specify files and directories excluded from version control.
/ML-Project
├── dataset/
│ ├── videos/
| |── model/
│ ├── WLASL_v0.3.json
│ ├── new_preprocessed-data/
│ │ ├── train/
│ │ ├── validation/
│ │ └── test/
│ ├── missing.txt
│
├── model_checkpoints/
│
├── landmarks/
│
├── CSE_4554_Project.ipynb
└── .gitignore
The project is organized into several key components:
-
Data Preprocessing:
- Extracting landmarks from ASL videos.
-
Folder Structure Setup:
- Set up the folder structure as outlined above and download the dataset from the link provided below.
-
Data Augmentation & Padding:
- Applying various data augmentation techniques to enhance the model's robustness.
- Padding each video sequence to a uniform length of 76 frames for consistent input dimensions during training.
-
Model Architecture:
- Utilizing a deep learning LSTM model for ASL recognition.
- Incorporating masking and dropout layers for regularization.
-
Label Encoding:
- Utilizing FastText to encode labels to word vectors for better representation.
-
Training and Evaluation:
- Training the model with optimized learning rates and callbacks.
- Evaluating the model's performance on a validation set.
-
Model Interpretation:
- Visualizing confusion matrices for model interpretation.
- Generating ROC curves for multi-class classification.
-
Saving Trained Model:
- Saving the trained model for future use.
- Ensure you have a Python 3.x environment set up on your machine.
- Make sure to download the dataset and setup the folder structure provided above.
- Execute the provided notebook.
- The notebook will automatically install the required dependencies and proceed with the following steps.
- The notebook includes code for processing the data. This step involves preparing and organizing the dataset for training.
- The notebook will train the machine learning model using the processed data.
- During this step, the model learns patterns and relationships in the data.
- The trained model will be evaluated to assess its performance.
- Standard evaluation metrics, such as accuracy and loss, will be plotted to gauge the overall effectiveness of the model.
- A confusion matrix will be generated to provide a detailed breakdown of the model's performance across different classes. This matrix is valuable for understanding the distribution of correct and incorrect predictions.
- The Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) will be computed. These metrics are especially useful for binary and multiclass classification tasks, offering insights into the model's ability to discriminate between classes.
- The ROC curve illustrates the trade-off between true positive rate and false positive rate across various thresholds.
- The AUC represents the area under the ROC curve, with a higher AUC indicating better model discrimination.
- These metrics provide a comprehensive view of the model's discriminatory power and can be particularly insightful in scenarios where a balanced assessment of true positives and false positives is crucial.
- After successful training and evaluation, the notebook will save the trained model.
- This saved model can be later used for making predictions on new data.
The WLASL-2000 Resized dataset used in this project can be found here.
This project is licensed under the MIT License.
-
Shanta Maria
- GitHub: NafisaMaliyat-iut
-
Nafisa Maliyat
- GitHub: maria-iut1234