This project explores the use of Convolutional Recurrent Neural Networks to translate a curated subset of American Sign Language (ASL) gestures directly into text through a live webcam feed.
Using the WLASL video dataset, alongside MediaPipe for precise hand-landmark and pose detection, members will learn how to transform videos of gestures into meaningful features suitable for machine learning. Early milestones include classifying single frames of hand gestures, before transitioning into recognition of gestures that consist of spatial changes over multiple frames using Long Short-Term Memory (LSTM) layers. The project culminates in a Streamlit application that visualizes hand movements and skeletal structures on screen in real-time, and displays translations using the trained gesture recognition model.
Final deliverable: A streamlit dashboard with an interface that provides real-time sign language translation, with a visualization of the skeletal structure and display of both the ASL glosses and the english translation of multiple gestures in sequence
- Week 1 Slides
- Week 1 Notebook - Intro to MediaPipe
- Download image:
!wget -q -O image.jpg https://storage.googleapis.com/mediapipe-tasks/hand_landmarker/woman_hands.jpg
Start by uploading your notebook to Google Colab. You can also access the notebook through Colab via the following link:
On the top left of your screen, click "File > Save as copy to Drive" or click "Copy to Drive" so that you can save your changes.
To enable the GPU on Google Colab, select the following settings from the menu bar at the top:
# colab: enable GPU:
# Runtime > Change Runtime Type > T4 GPU > Save
# note: may need to restart your Notebook session (Runtime > Restart Session)For running on Google Colab, note that you won't be able to use a web camera for real-time classification, however you can use Colab to train your model using the provided GPU option, then download the saved model weights to your local machine and run inference using your computer's CPU. See the PyTorch documentation or feel free to ask us if you need assistance with saving and loading model weights.
model = LSTM_Model()
model.to(device)
model.train()
# ... train your model ...
torch.save(model.state_dict(), "model_weights.pth")
# <download from Colab to your local setup>
# load model weights using same class definition used for training (LSTM_Model)
model = LSTM_Model()
model.load_state_dict(torch.load("model_weights.pth"))
model.to(device)
model.eval()