This repository accompanies the analysis for “Phytotherapy Discovery through Ontology-Guided Machine Learning.”
It is intended to be used with the upstream phytotherapy ontology maintained in RomanoLab/poppy (the source of truth for the ontology itself):
- Upstream ontology + primary project context: https://github.com/RomanoLab/poppy
This repository also includes a modified CTAPred pipeline used to predict protein–phytochemical interactions, supporting downstream mechanism inference and repurposing analyses.
- Background
- Repository Layout
- Quick Start
- Command-line Reference
- Workflow Details
- Extending the Toolkit
- Contributing
- Citation
Researchers in natural-product drug discovery often work with ontologies linking plants → chemicals → targets → therapeutic roles. In this project, the ontology is maintained upstream in RomanoLab/poppy, while this repository contains analysis companion materials and the CTAPred workflow used in the ontology-guided ML portion of the study.
At a high level, the workflow supports:
- Starting from an ontology-derived phytochemical set (from
poppy) - Predicting likely protein targets for phytochemicals (CTAPred)
- Using predictions for downstream analysis (e.g., candidate prioritization, mechanism inference)
.
├── README.md # you are here
├── requirements.txt # repo-level environment dependencies (if applicable)
└── CTAPred/ # modified CTAPred pipeline for protein–phytochemical predictions
├── predict_targets.py
├── SharedFunc.py
├── requirements.txt
├── QueryList1_smiles.csv
└── README.md # CTAPred usage instructions
| File / Dir | Purpose |
| ---------------------- | -----------------------------------------------------------------------------|
| `CTAPred/` | Modified CTAPred pipeline used to predict protein–phytochemical interactions |
| `CTAPred/README.md | Canonical usage instructions for running CTAPred in this repo |
| `Crequirements.txt` | Repo-level dependencies used by analysis utilities |
git clone https://github.com/RomanoLab/phytotherapy-discovery.git
cd phytotherapy-discovery
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
If you are running CTAPred, install CTAPred’s pinned dependencies:
pip install -r CTAPred/requirements.txt
If you are running other repo-level analysis utilities, you may also install the root requirements:
pip install -r requirements.txt
Follow the instructions in CTAPred/README.md. To see the CLI options:
python CTAPred/predict_targets.py --help
CTAPred/QueryList1_smiles.csv is provided as an example/input artifact containing phytochemical SMILES used for predictions.
Because this repository’s executable workflow is centered on CTAPred, the authoritative CLI reference is:
python CTAPred/predict_targets.py --help
For usage examples, input expectations, and output files, see:
CTAPred/README.md
A typical end-to-end workflow looks like this:
- Consult / generate ontology artifacts upstream Use RomanoLab/poppy to access the ontology and (if needed) derive a phytochemical set for ML analysis.
- Prepare CTAPred inputs Provide a CSV of phytochemical SMILES (use CTAPred/QueryList1_smiles.csv as a template).
- Run CTAPred predictions Execute CTAPred/predict_targets.py to generate predicted protein targets / ranked interactions.
- Downstream analysis / integration Use predictions for prioritization, hypothesis generation, or reintegration into ontology/KG analyses (depending on your study design and tooling).
Ideas that keep this repository aligned with its current scope:
- Modularise – split into smaller modules and expose a proper Python API.
- Unit tests – add
pytestand GitHub Actions for continuous integration. - Visualisation – integrate network diagrams or heatmaps (e.g., via
matplotlib). - Docker – package the pipeline for reproducible runs on any machine.
- Fork → create a feature branch → commit + push → open a Pull Request.
- Keep changes scoped and well-described (what/why/how).
- If you change model inputs/outputs, update CTAPred/README.md and this README accordingly.
If you use this repository in research, please cite:
@software{hewryk_ontology_work_2025,
author = {Hewryk, Oresta S. I. and Pan, Ian Tong and Romano, Joseph D.},
title = {{Ontology Work}: A Python toolkit for phytotherapy ontology enrichment},
year = {2025},
publisher = {GitHub},
url = {https://github.com/RomanoLab/ontology-work}
}