Spotify Genre Classification

Term Project for Data Science II (Group 8)

📄 Click here to view the Full Project Report

A comparative analysis of six machine learning models for predicting the genre of songs on Spotify based on their audio features and metadata.

Group Members

Anvita Yerramsetty
Austin Bell
Carter Prince
Robera Abajobir
Sanghyun An
Tyler Varma

Project Overview

This project aims to classify music tracks into 24 distinct genres using 14 audio features extracted from the Spotify API. We implemented a complete data science pipeline including data cleaning, hybrid balancing, feature scaling, hyperparameter tuning, and comparative analysis.

The models evaluated are:

Logistic Regression
K-Nearest Neighbors (KNN)
Gaussian Naive Bayes
Random Forest Classifier
Gradient Boosting (XGBoost)
Multilayer Perceptron (Neural Network)

Directory Structure

.
├── data/                   # Contains processed CSV files and metadata
├── output/                 # JSON files containing results from each model run
├── preprocess.py           # Script to clean, balance, and scale the raw data
├── analyze.py              # Script to generate the results table and leaderboard
├── generate_figures.py     # Script to create all visualizations for the report
├── report.tex              # Final LaTeX report source code
├── report.pdf              # Final report PDF
├── requirements.txt        # Python dependencies
└── [model_scripts].py      # Individual training scripts (e.g., xgboost.ipynb, mlp.py)

How to Reproduce Results

1. Prerequisites

Ensure you have Python 3.8+ installed. Install the required dependencies:

pip install pandas numpy scikit-learn xgboost torch matplotlib seaborn

2. Data Preprocessing

The raw data (SpotifyFeatures.csv) is processed into training and testing sets. This script handles cleaning, encoding, scaling, and splitting.

python preprocess.py

Output: Generates X_train.csv, y_train.csv, X_test.csv, y_test.csv in the data/ folder.

3. Model Training

Each model has its own script or notebook. Running these will perform hyperparameter tuning and save the results to a standardized JSON file in the output/ directory.

Example:

python mlp.py
python gaussian_nb.py
# ... etc

Note: The output/ directory already contains the results from our final runs, so re-training is optional.

4. Analysis & Visualization

To generate the comparative leaderboard and the figures used in the report:

# Generates the leaderboard and efficiency plots
python analyze.py

# Generates specific figures (Class Balance, Correlation, Hyperparameter plots)
python generate_figures.py

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
analysis_images		analysis_images
data		data
output		output
presentation_plots		presentation_plots
report_images		report_images
KNN_Classifier.ipynb		KNN_Classifier.ipynb
Logisitic_Classifier.ipynb		Logisitic_Classifier.ipynb
README.md		README.md
SpotifyFeatures.csv		SpotifyFeatures.csv
XGBoost.ipynb		XGBoost.ipynb
analyze_hyperparams.py		analyze_hyperparams.py
analyze_results.py		analyze_results.py
eda.py		eda.py
gaussian_nb.py		gaussian_nb.py
generate_figures.py		generate_figures.py
knn_confusion_matrix.png		knn_confusion_matrix.png
mlp.py		mlp.py
preprocess.py		preprocess.py
presentation.pdf		presentation.pdf
presentation_plots.py		presentation_plots.py
proposal.pdf		proposal.pdf
random_forest.ipynb		random_forest.ipynb
readme_group.md		readme_group.md
report.pdf		report.pdf
report.tex		report.tex
requirements.txt		requirements.txt
xgb_confusion.py		xgb_confusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Genre Classification

Group Members

Project Overview

Directory Structure

How to Reproduce Results

1. Prerequisites

2. Data Preprocessing

3. Model Training

4. Analysis & Visualization

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spotify Genre Classification

Group Members

Project Overview

Directory Structure

How to Reproduce Results

1. Prerequisites

2. Data Preprocessing

3. Model Training

4. Analysis & Visualization

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages