Translated nucleotide BLAST

Description

This tool runs a nucleotide BLAST search against a protein sequence database (blastx) using the NCBI BLAST server. The query sequence file can contain up to 10 sequences.

Parameters

  • Database This option selects the target protein database in the NCBI BLAST server. The available databases are: NCBI non-redundant proteins, PDB, UniProt/Swiss, RefSeq reference proteins, Patented protein sequences, Metagenomic proteins, Transkriptome Shotgun Assembly proteins. By default, SwissProt is used as the target database.
  • Expectation threshold for saving hits E-value specifies the statistical significance threshold for reporting matches against database sequences. The default value 10 means that 10 such matches are expected to be found merely by chance. Lower thresholds are more stringent, leading to fewer chance matches being reported.
  • Genetic code. Genetic code to use to translate the query into protein sequence. In addition to the standard genetic code you choose one of the mitochondrial or other non-standard genetic codes to be used
  • Word size The length of the seed that initiates an alignment. BLAST works by finding word-matches between the query and database sequences. One may think of this process as finding hot-spots that BLAST can then use to initiate extensions that might eventually lead to full-blown alignments. For BLASTP searches non-exact word matches are taken into account based upon the similarity between words. The amount of similarity can be varied so one normally uses just the word-sizes 2 and 3 for these searches.
  • Maximun number of hits to collect per sequence This parameter limits the number of hit sequences reported for one query sequence. By default up to 100 hits are reported, but if you wish to collect all the hits, and not just the best ones you should in many cases increase this value significantly.
  • Output format type The BLAST results can be presented in many different formats. The classical BLAST report is not optimal for big data query sets or in the cases where the results will be analyzed with other tools. In addition to the text based BLAST reports, the results can be presented as table or XML file. You can also produce a fasta formatted sequence file containing the matching hit sequence regions or a list of hit sequence names.
  • Filter low complexity regions Use SEG program for filtering low complexity regions in the query sequence.
  • Entrez query to limit search You can use Entrez query syntax to search a subset of the selected BLAST database. This can be helpful to limit searches to molecule types, sequence lengths or to exclude organisms.
  • Location on the query sequence Location of the search region in the query sequence, for example: 23-66.
  • Matrix"b> Weight matrix assigns a score for aligning pairs of residues, and determines overall alignment score. Experimentation has shown that the BLOSUM62 matrix is among the best for detecting most weak protein similarities. For particularly long and weak alignments, the BLOSUM45 matrix may prove superior. For proteins, shorter than 85 residues, the BLOSUM80 matrix may provide better hits"
  • Gap opening penalty Cost to open a gap. Integer value from 6 to 25. The default value of this parameter depends on the selected scoring matrix. Note that if you assign this value, you must define also the gap extension penalty
  • Gap extension penalty Gap extension penalty. Integer value from 1 to 3.The default value of this parameter depends on the selected scoring matrix. Note that if you assign this value, you must define also the gap opening penalty
  • Output a log file Collect a log file for the BLAST run.