
Usage: tomtom [options] -query <query motifs>
-target <target motifs>
Description:
The Tomtom
program searches one or more query
motifs against a database of target motifs, and reports for each
query a list of target motifs, ranked by q-value. The q-value
is the minimal false discovery rate at which the observed
similarity would be deemed significant. The output contains
results for each query, in the order that the queries appear in
the input file. With respect to each query, targets are ranked
by q-value.
For a given pair of motifs, the program considers all offsets,
while requiring a minimum number of overlapping positions. For a
given offset, each overlapping position is scored using one of
seven column similarity functions defined below. In order to
compute the scores, Tomtom
needs to know the
frequencies of the letters of the sequence alphabet in the
database being searched (the "background" letter
frequencies). By default, the background letter frequencies
included in the MEME input files are used. The scores of columns
that overlap for a given offset are summed. This summed score is
then converted to a p-value. The reported p-value is the minimal
q-value over all possible offsets. For each query motif, the
corresponding p-values are then converted to q-values.
Input:
- <query motifs> - A file containing one or more motifs in MEME format. Each of these motifs will be searched against the target database.
- <target motifs> - A file containing one or more motifs in MEME format.
Output:
Tomtom
writes its output to files in a directory named tomtom_out, which it creates if necessary. (You can also cause the output to be written to a different directory; see -o and -oc, below.) The main output file is named tomtom.html and can be viewed with an internet browser. A second file, tomtom.txt, contains a simplified, text-only version of the output. (See -text, below, for the text output format.) For each query-target match, two additional files containing LOGO alignments are also written--an encapsulated postscript file (.eps) and a PNG file (.png). If the convert program is not available, no PNG files will be written.Only matches for which the q-value is greater then threshold set by the
--q-thresh
(default of 0.5) will be shown. The q-value is the estimated false discovery rate if the occurrence is accepted as significant. See Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445
Options:
-
-o <output dir>
- Name of the output directory for all output files. If the output directory already exists, it will not be replaced and the program will exit without doing anything. -
-oc <output dir>
- Name of the output directory for all output files. If the output directory already exists, it will be replaced ('clobbered'). -
-q-thresh <value>
- Only report q-values below the specified threshold (Default = 0.5). -
-min-overlap <value>
- Only report motif matches that overlap by this many positions or more. In case a query motif is smaller than the value ofmin-overlap
, then the corresponding motif-width is used as the requiredmin-overlap
for that query. The default value is 5. -
-internal
- This parameter forces the shorter motif to be completely contained in the longer motif. -
-dist <allr|ed|kullback|pearson|sandelin>
These values correspond to Pearson correlation coefficient (pearson
), Average log-likelihood ratio (allr
), Kullback-Leibler divergence (kullback
), Euclidian distance (ed
) and Sandelin-Wasserman function (sandelin
). Detailed descriptions of these functions can be found in the published description ofTomtom
. -
-query-pseudo <float>
This option adds the specified pseudocount to each count in the each query matrix. The default value is 0. -
-target-pseudo <float>
This option adds a pseudocount to each count in each target matrix. The default value is 0. -
-query-url <string>
This option causes the names of query motifs in the output to be inserted into the given C format string in the first two %s elements. It is used to create hot-links to the entry for the motif in the given, on-line database. -
-target-url <string>
This option causes the names of query motifs in the output to be inserted into the given C format string in the first two %s elements. It is used to create hot-links to the entry for the motif in the given, on-line database. -
-text
This option causes Tomtom to print just a tab-delimited text file to standard output. The output begins with a header, indicated by leading "#" characters. This is followed by a single title line, and then the actual values. The columns are:- Query motif name
- Target motif name
- Optimal offset: the offset between the query and the target motif
- q-value
- Overlap: the number of positions of overlap between the two motifs.
- Query consensus sequence.
- Target consensus sequence.
- Orientation: Orientation of target motif with respect to query motif.
Example format strings for target-url
and query-url
.
- For JASPAR:
"<a href='http://jaspar.genereg.net/cgi-bin/jaspar_db.pl?rm=present&db=CORE&ID=%s'>%s</a>"
- For TRANSFAC:
"<a href='http://www.biobase.de/cgi-bin/biobase/transfac/11.1/bin/getTFProf.cgi?%s'>%s</a>"
- For SCPD:
<a href='http://rulai.cshl.edu/cgi-bin/SCPD/getfactor?%s'>%s</a>"
- For MACISAAC:
"<a href='http://fraenkel.mit.edu/improved_map/v1.tamo'>%s</a>"
- For FLYREG:
"<a href='http://www.danielpollard.com/bergman2004_matrices.html'>%s</a>"
- For DPINTERACT:
"<a href='http://arep.med.harvard.edu/ecoli_matrices/%s.html'>%s</a>"
- For REGTRANSBASE:
"<a href='http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=alignment_browse'>%s</a>"
- For PRODORIC:
"<a href='http://prodoric.tu-bs.de/matrix.php?matrix_acc=%s'>%s</a>"
Bugs: none known.
Authors: Shobhit Gupta (shobhitg@u.washington.edu), Timothy Bailey (tbailey@imb.uq.edu.au), Charles E. Grant (cegrant@gs.washington.edu) and William Noble (noble@gs.washington.edu).