Alignment / TopHat2 for paired-end reads

Description

This tool aligns Illumina paired end RNA-seq reads to publicly available genomes. If you would like us to add new reference genomes to Chipster, please contact us. You need to supply the reads in a FASTQ file. Supplying a GTF file is optional but recommended, because using annotation information improves the alignment process. Chipster provides GTF files for human, mouse and rat (rn4) genomes.

Parameters

Details

TopHat2 maps Illumina RNA-Seq reads to a genome in order to identify exon-exon splice junctions. The alignment process consists of three steps. If annotation is available as a GTF file, TopHat will extract the transcript sequences and use Bowtie2 to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that still remain unmapped are split into shorter segments, which are then aligned to the genome. Segment mappings are used to find potential splice sites. Sequences flanking a splice site are concatenated, and unmapped segments are mapped to them. Segment alignments are then stitched together to form whole read alignments.

The "anchor length" means that TopHat2 will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. By default no mismatches are allowed in the anchor, but you can change this.

TopHat2 will ignore donor-acceptor pairs which are closer than the minimum intron length or further than the maximum intron length apart.

For paired-end reads, TopHat2 processes the two reads separately through the same mapping stages described above. In the final stage, the independently aligned reads are analyzed together to produce paired alignments, taking into consideration additional factors including fragment length and orientation. The expected mean inner distance between mate pairs means the fragment length minus the reads. For example, if your fragment size is 300 bp and read length is 50 bp, the inner distance is 200.

After running TopHat2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.

Output

This tool returns a BAM file containing the alignment, and an index file (.bai) for it. In addition it produces the following BED files:

Reference

This tool is based on the TopHat package. Please cite the following article:

Kim D, Petrtea G, Trapnell C, et al. TopHat2: accurate alignments of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology (2013) 14: R36.