DanceBits API: ML-powered automated choreography video segmentation

Backend implementing a multimodal AI model in PyTorch and serving it using FastAPI and Docker to automatically identify and label dance moves in videos to create an interactive learning platform. Features: video preprocessing, pose estimation, audio processing, and multimodal segmentation model.

Features

Advanced pose estimation and motion feature extraction using MediaPipe
Audio feature extraction for enhanced move detection
Real-time multimodal dance move segmentation model
User-friendly learning interface with customizable speeds and segment sizes
Side-by-side webcam/video option with recording functionality
Similarity score calculation for comparing dance performances

Requirements

Python 3.11 or higher
FFmpeg
CUDA-compatible GPU (optional, for faster inference)
Docker (optional, for containerized deployment)

Installation

Clone this repository:

git clone https://github.com/your-username/dance-bits-api.git
cd dancebits

Create a Conda environment and activate it:

conda create --name dance-bits-api python
conda activate dance-bits-api

Install the required packages:
```
pip install -r requirements.txt
```
Note: You can also install via Conda, but some packages may not be available:
```
conda install --file requirements.txt
```

Local Deployment

Running the Model Locally

Set up environment variables in a .env file:

WANDB_API_KEY=your_key
WANDB_ORG=your_org
WANDB_PROJECT=your_project
WANDB_MODEL_NAME=your_model
WANDB_MODEL_VERSION=your_version

Install FFmpeg (required for video processing):
- On Ubuntu/Debian:
```
sudo apt-get update
sudo apt-get install ffmpeg libsm6 libxext6
```
- On macOS:
```
brew install ffmpeg
```
- On Windows: Download from FFmpeg website and add to PATH

Start the FastAPI server:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8080

Access the API:
- API documentation: http://localhost:8080/docs
- Alternative API docs: http://localhost:8080/redoc

Testing the API

Test video segmentation:

curl -X POST "http://localhost:8080/predict/" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "video=@path/to/your/dance_video.mp4" \
  -F "min_segmentation_prob=0.5"

Test video comparison:

curl -X POST "http://localhost:8080/compare/" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "user_video=@path/to/user_video.mp4" \
  -F "teacher_video=@path/to/teacher_video.mp4"

Troubleshooting

Model Loading Issues:
- Ensure all environment variables are set correctly
- Check if the model weights are downloaded properly
- Verify CUDA availability if using GPU
Video Processing Issues:
- Verify FFmpeg installation: ffmpeg -version
- Check video format compatibility (MP4, AVI, MOV supported)
- Ensure sufficient disk space for temporary files
Memory Issues:
- Reduce video resolution if experiencing OOM errors
- Consider using CPU inference if GPU memory is limited
- Monitor system resources during processing

Docker Deployment

Build the Docker image:
```
docker build -t dancebits-api .
```

Run the container:

docker run -d --name dancebits-api \
  -p 8080:8080 \
  -e WANDB_API_KEY=your_key \
  -e WANDB_ORG=your_org \
  -e WANDB_PROJECT=your_project \
  -e WANDB_MODEL_NAME=your_model \
  -e WANDB_MODEL_VERSION=your_version \
  dancebits-api

Environment Variables

The following environment variables are required for the application:

WANDB_API_KEY: Weights & Biases API key
WANDB_ORG: Weights & Biases organization name
WANDB_PROJECT: Weights & Biases project name
WANDB_MODEL_NAME: Name of the model to use
WANDB_MODEL_VERSION: Version of the model to use

API Endpoints

Predict Dance Segments

POST /predict/

Segments a dance video into individual moves.

Parameters:

video: Video file (MP4, AVI, or MOV)
min_segmentation_prob: Minimum probability threshold for segmentation (default: 0.5)

Response:

{
    "segmented_probs": [...],
    "segmented_percentages": [...]
}

Compare Videos

POST /compare/

Calculates similarity score between two dance videos.

Parameters:

user_video: User's dance video file
teacher_video: Teacher's reference video file

Response:

{
    "similarity_score": float
}

Technical Details

Video Processing Pipeline

Frame Extraction: Videos are processed frame by frame using OpenCV
Pose Estimation: MediaPipe Pose is used to extract 35 bone vectors per frame
Audio Processing:
- Audio is extracted from video using MoviePy
- Mel spectrogram is generated using Librosa
- Tempo analysis for beat detection
Model Inference:
- Processes both visual (pose) and audio features
- Returns frame-by-frame segmentation probabilities
Post-processing:
- Smoothing of segmentation probabilities
- Dynamic adjustment based on beat detection
- Segment identification based on probability thresholds

Performance Considerations

The API supports both CPU and GPU inference
Video processing is optimized for real-time performance
Temporary files are automatically cleaned up after processing
CORS is enabled for all origins by default

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DanceBits API: ML-powered automated choreography video segmentation

Features

Requirements

Installation

Local Deployment

Running the Model Locally

Testing the API

Troubleshooting

Docker Deployment

Environment Variables

API Endpoints

Predict Dance Segments

Compare Videos

Technical Details

Video Processing Pipeline

Performance Considerations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DanceBits API: ML-powered automated choreography video segmentation

Features

Requirements

Installation

Local Deployment

Running the Model Locally

Testing the API

Troubleshooting

Docker Deployment

Environment Variables

API Endpoints

Predict Dance Segments

Compare Videos

Technical Details

Video Processing Pipeline

Performance Considerations

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages