TrackCluster

Trackcluster is an isoform calling and quantification pipeline for long RNA/cDNA reads.

Overview

A pipeline for reference-based isoform identification and quantification using long reads. This pipeline was designed to use only long and nosisy reads to make a valid transcriptome. An indicator for the intact 5' could be very helpful to the pipeline, i.e, the splicing leader in the mRNA of nematodes.

The major input/output for this pipeline is "bigg"--"bigGenePred" format.

Requirements

developed on python 3.9, tested on python 3.6 and above (or 2.7.10+), should work with most of the py3 versions
samtools V2.0+ , bedtools V2.24+ and minimap2 V2.24+ in your $PATH

# install the external bins with conda
conda install -c bioconda samtools
conda install -c bioconda bedtools
conda install -c bioconda minimap2

Installation

# use pip from pypi
pip install trackcluster
# or pip from source code for the latest version
git clone https://github.com/Runsheng/trackcluster.git
pip install ./trackcluster

Recommendations

UCSC Kent source tree (for generating binary track), used only in bigg2b.py

Scripts

All scripts can be run directly from shell after pip installation.

trackrun.py: the main script for trackcluser run
bam2bigg.py: convert the mapped read from the bam file, to bigg track format
gff2bigg.py: convert the isoform annotation in gff3 to bigg format
bigg2b.py: convert the bigg track into binary format for better loading in IGV/UCSC
biggmutant.py: change the value of one column in a bigglist

Walkthrough

# test if all dependencies are installed
trackrun.py test --install

# prepare the reference annotation bed file from gff file
# tested on Ensembl, WormBase and Arapost gff
gff2bigg.py -i ensemblxxxx.gff3 -o ref.bed 
# WormBase full gff contains too many information, need to extract the lines from WormBase only
cat c_elegans.PRJNA13758.WS266.annotations.gff3 |grep WormBase > ws266.gff
gff2bigg.py -i ws266.gff -o ref.bed
# the ref.bed can be sorted to speed up the analysis
bedtools sort -i ref.bed > refs.bed # refs.bed contains the sorted, know transcripts from gff annotation

# generate the read track from minimap2 bam file
bam2bigg.py -b group1.bam -o group1.bed
bam2bigg.py -b group2.bam -o group2.bed

# merge the bed file and sort
cat group1.bed group2.bed > read.bed
bedtools sort -i read.bed > reads.bed

# Examples for running commands:
trackrun.py clusterj -s reads.bed -r refs.bed -t 40 # run in junction mode, will generate the isoform.bed
trackrun.py count -s reads.bed -r refs.bed -i isoform.bed # generate the csv file for isoform expression
# alternative for cluster
trackrun.py cluster -s reads.bed -r refs.bed -t 40 # run in exon/intron intersection mode， slower, will generate the isoform.bed

# the post analysis could include the classification of novel isoforms
trackrun.py desc --isoform isoform.bed --reference ref.bed # generate the description for each novel isoform
# this part can be run directly on reads, to count the frequency of splicing events in reads, like intron_retention
trackrun.py addgene -r ref.bed -s reads.bed # will generate reads_gene.bed
trackrun.py desc --isoform reads_gene.bed --reference ref.bed # will generated reads_desc.txt and reads_class12.txt

Citation

Please kindly cite our paper for using trackcluster in your work.

Li, R., Ren, X., Ding, Q., Bi, Y., Xie, D. and Zhao, Z., 2020. Direct full-length RNA sequencing reveals unexpected transcriptome complexity during Caenorhabditis elegans development. Genome research, 30(2), pp.287-298.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
doc		doc
script		script
test		test
trackcluster		trackcluster
.gitignore		.gitignore
Readme.md		Readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrackCluster

Table of Contents

TODO:

Overview

Requirements

Installation

Recommendations

Scripts

Walkthrough

Citation

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

Runsheng/trackcluster

Folders and files

Latest commit

History

Repository files navigation

TrackCluster

Table of Contents

TODO:

Overview

Requirements

Installation

Recommendations

Scripts

Walkthrough

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages