Alignment / TopHat for single end reads

Description

This tool aligns Illumina single end RNA-seq reads to publicly available genomes. You need to supply the reads in a FASTQ file. If you would like us to add new reference genomes to Chipster, please contact us.

Parameters

Details

TopHat first identifies potential exons by mapping the reads to the genome using the Bowtie aligner. Using this initial mapping, it builds a database of possible splice junctions, and then maps the reads against these junctions to confirm them. As many exons are shorter than reads, TopHat splits the reads into smaller segments, which are then mapped independently. The segment alignments are "glued" back together in a final step of the program to produce end-to-end read alignments. TopHat generates its database of possible splice junctions from two sources of evidence:

The "anchor length" means that TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. By default no mismatches are allowed in the anchor, but you can change this.

TopHat will ignore donor-acceptor pairs which are closer than the minimum intron length or further than the maximum intron length apart. With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns can be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns.

TopHat filters out junctions supported by too few alignments. Suppose a junction spanning two exons is supported by S reads. Let the average depth of coverage of exon A be D, and assume that it is higher than B. If S divided by D is less than the minimum isoform fraction, the junction is not reported. A value of zero disables this filter.

After running TopHat, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.

Output

This tool returns a BAM file containing the alignment, and an index file (.bai) for it. In addition it produces the following BED files:

Reference

This tool is based on the TopHat package. Please cite the following article:

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 25 (9): 1105-1111.