BWA for paired-end reads
Description
Aligns paired-end reads to selected reference genome using the BWA aln algorithm.. The reads have to be supplied in two fastq files.
Parameters
- What pre-indexed genome is used as the reference. (Human genome (hg19), Mouse genome (mm9), Rat genome (rn4), Mouse miRBase17.) [Mouse genome]
- Seed length. How many bases of the left, good quality part of the read should be used as the seed region. If the seed length is longer than the reads, the seeding will be disabled. Corresponds to the command line parameter -l, [32]
- Maximum number of differences in the seed region. Maximum number of differences such as mismatches or indels in the seed region.[2]
- Maximum edit distance for the whole read. Maximum edit distance if the value is more than one. If the value is between 1 and 0 then it defines the fraction of missing alignments given 2% uniform base error rate. In the latter case, the maximum edit distance is automatically chosen for different read lengths. Corresponds to the command line parameter -n. [0.04]
- Quality value format used. Note that this parameter is taken into account only if you chose to apply the mismatch limit to the seed region. Are the quality values in the Sanger format (ASCII characters equal to the Phred quality plus 33\) or in the Illumina Genome Analyzer Pipeline v1.3 or later format (ASCII characters equal to the Phred quality plus 64). Corresponds to the command line parameter -I. [Sanger]
- Maximum number of gaps. Maximum number of gap openings for one read. Corresponds to the command line parameter -o.[1]
- Maximum number of gap extensions. Maximum number of gap extensions, -1 for disabling long gaps. Corresponds to the command line parameter -e.[-1]
- Gap opening penalty. Corresponds to the command line parameter -O. [11]
- Gap extension penalty. Corresponds to the command line parameter -E. [4]
- Mismatch penalty threshold. BWA will not search for suboptimal hits with a score lower than the alignment score minus this. Corresponds to the command line parameter -M. [3].
- Disallow gaps in region. Disallow a long deletion within the given number of bp towards the 3’-end. Corresponds to the command line parameter -d. [16].
- Disallow an indel within the given number of bp towards the ends. Do not put an indel within the defined value of bp towards the ends. Corresponds to the command line parameter -i.[5]
- Quality trimming threshold. Quality threshold for read trimming down to 35bp. Corresponds to the command line parameter -q. [0].
- Barcode length. Length of barcode starting from the 5 prime-end. The barcode of each read will be trimmed before mapping. Corresponds to the command line parameter -B. [0].
- How many valid alignments are reported per read. Maximum number of alignments to report. Corresponds to the command line parameter bwa samse -n [3].
- Maximum hits to output for paired reads. Maximum number of alignments to output in the XA tag for reads paired properly. If a read has more than the given amount of hits, the XA tag will not be written. Corresponds to the command line parameter bwa sampe -n. [3]
- Maximum hits to output for discordant pairs. Maximum number of alignments to output in the XA tag for disconcordant read pairs, excluding singletons. If a read has more than INT hits, the XA tag will not be written. Corresponds to the command line parameter bwa sampe -N. [10]
- Maximum insert size. Maximum insert size for a read pair to be considered being mapped properly. This option is only used when there are not enough good alignments to infer the distribution of insert sizes. Corresponds to the command line parameter bwa sampe -a.[500]
- Maximum occurrences for one end.Maximum occurrences of a read for pairing. A read with more occurrences will be treated as a single-end read. Reducing this parameter helps faster pairing. The default value is 100000. For reads shorter than 30bp, applying a smaller value is recommended to get a sensible speed at the cost of pairing accuracy. Corresponds to the command line parameter bwa sampe -o.[100000]
Details
This tool uses BWA short read aligner to align two sets of
FASTQ formatted sequences against a fasta-formatted reference sequence. The first read file should contain all the
first reads of the read pairs and the second read file all the pairs for the first reads in the same order as in the first read file.
A BWA index is automatically calculated for the given reference genome or sequence set. Aligning is performed with
the Burrows-Wheeler Transform based aln algorithm that allows gaps in the alignments.
Output
As a result the tool returns a sorted and indexed BAM-formatted alignment.