Biological database integration project combining R + SQL + APIs to unify annotations from Ensembl, UniProt, KEGG, and miRBase.
Build a reproducible mini data-integration pipeline that:
- retrieves biological entities from major resources,
- standardizes identifiers,
- stores harmonized records in a relational schema,
- supports downstream querying and interpretation.
BD_assignment.R— R workflow for API retrieval, transformation, and integrationBD_assignment.sql— relational schema and SQL componentsBD_asignment.pdf— coursework/report documentation
- API-driven biological data ingestion
- relational schema design for bioinformatics entities
- R-to-SQL integration workflows
- reproducible scripting and data transformation logic
- R (
httr,jsonlite,biomaRt,RMySQL,clusterProfiler) - MySQL
- REST APIs (UniProt, KEGG; Ensembl access patterns)
- Configure database credentials in R script.
- Execute data retrieval and mapping functions.
- Persist mapped outputs into normalized SQL tables.
- Validate integrated tables with SQL queries and summary checks.
This project highlights real-world integration thinking: moving from fragmented biological sources to a queryable, analysis-ready data model.