Prophage/provirus and plasmid geNomad output "Feature" example files#19
Prophage/provirus and plasmid geNomad output "Feature" example files#19sierra-moxon wants to merge 12 commits intomainfrom
Conversation
|
Hi @sierra-moxon , I'm not familiar with the KBase CDM, so I need to brush up :-) But looking at these files here are a few things to keep in mind:
Taking a quick look at https://kbase.github.io/cdm-schema , I think prophage naturally fit as features, i.e. "A feature localized to an interval along a contig.". To keep things consistent, I wonder if all viruses and plasmids should be defined as features, with plasmids and non-provirus essentially being a feature over the whole contig ? That would enable these predictions to be flexible in the feature, e.g. if geNomad (or another tool) is used that can predict integrated plasmids, or even if a given virus gets its prediction refined from full contig to a provirus. It looks like the terms "SO_0001041" and "SO_0000155" would work ? Let me know if that makes sense, I can also provide geNomad output examples that have provirus predictions (looks like there were none in Lauren's data ?) |
|
I added some constraints around CDM identifiers in the last schema update, so I will update your data accordingly. |
@lmlui
@simroux - have a look at some of this test data we assembled with the help of @lmlui. We're wondering how to map plasmids and prophage/provirus geNomad output to the Feature class in the CDM for KBase. This was our first attempt. We also added sample data with questions in the tests/data dir to inform how we did this.