Skip to content

This project builds a mutation-based classifier to predict cancer subtypes (HNSC vs LUSC) from genomic variant data.

Notifications You must be signed in to change notification settings

DanielBarLev2/computational_genomic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cancer Subtype Classification 🧬

This project builds a mutation-based classifier to predict cancer subtypes (HNSC vs LUSC) from genomic variant data.

🔧 Project Structure

project_root/
# ├── data/
# │   ├── train_muts_data.csv
# │   ├── test_muts_data.csv
# │   ├── train_meth_data.csv
# │   ├── test_meth_data.csv
# │   ├── train_feats.csv
# │   ├── test_feats.csv
# │   ├── 100_genes.csv
# │   ├── E_cool_ORF.csv
# ├── plots/
# │   ├──mutation_types_ordered.png
# ├── predictions/
# │   ├── results_muts.csv
# │   ├── results_meth.csv
# ├── Challenge_comp_geno.ipynb

📦 Requirements

Python 3.8+

Set up the environment using Conda:

conda env create -f environment.yml
conda activate genomic-classifier

📁 Input Files

train_muts_data.csv: raw mutation records with case IDs and labels
test_muts_data.csv: mutation records for inference
train_feats.csv: numeric precomputed gene features
test_feats.csv: numeric precomputed gene features
train_meth_data.csv raw meth records with case IDs and labels
test_meth_data.csv meth records for inference

📤 Output Files

predictions.csv: contains final case-level predictions for the test set of task 1
predictions2.csv: contains final case-level predictions for the test set of task 2

About

This project builds a mutation-based classifier to predict cancer subtypes (HNSC vs LUSC) from genomic variant data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published