This tool aligns Illumina paired-end RNA-seq reads to publicly available genomes. You need to supply the reads in two fastq files containing the reads in the same order. You can also supply a GTF file, if you would like TopHat to use existing splice site information. If you would like us to add new reference genomes to Chipster, please contact us.
TopHat2 first identifies potential exons by mapping the reads to the genome using the Bowtie aligner. Using this initial mapping, it builds a database of possible splice junctions, and then maps the reads against these junctions to confirm them. As many exons are shorter than reads, TopHat2 splits the reads into smaller segments, which are then mapped independently. The segment alignments are "glued" back together in a final step of the program to produce end-to-end read alignments. TopHat generates its database of possible splice junctions from three sources of evidence:
You have to give the expected mean inner distance between mate pairs. For example, if your fragment size is 300 bp and read length is 50 bp, the inner distance is 200.
The "anchor length" means that TopHat2 will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. By default no mismatches are allowed in the anchor, but you can change this.
TopHat2 will ignore donor-acceptor pairs which are closer than the minimum intron length or further than the maximum intron length apart. With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns can be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns.
TopHat can optionally use existing gene model annotations (splice sites). If GTF file is supplied (by the user or available from the server), TopHat will use the exon records in this file to build a set of known splice site junctions for each gene, and it will attempt to align reads to these junctions even if they would not normally be covered by the initial mapping. If the GTF file is used, the user can specify if TopHat should look for reads across the known splice junctions only.
After running TopHat2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.
This tool is based on the TopHat package. Please cite the following article:
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 25 (9): 1105-1111.