TOMTOM logo

Usage: tomtom [options] -query <query motifs> -target <target motifs>

Description:

The Tomtom program searches one or more query motifs against a database of target motifs, and reports for each query a list of target motifs, ranked by q-value. The q-value is the minimal false discovery rate at which the observed similarity would be deemed significant. The output contains results for each query, in the order that the queries appear in the input file. With respect to each query, targets are ranked by q-value.

For a given pair of motifs, the program considers all offsets, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. In order to compute the scores, Tomtom needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the "background" letter frequencies). By default, the background letter frequencies included in the MEME input files are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to a p-value. The reported p-value is the minimal q-value over all possible offsets. For each query motif, the corresponding p-values are then converted to q-values.

Input:

Output:

Tomtom writes its output to files in a directory named tomtom_out, which it creates if necessary. (You can also cause the output to be written to a different directory; see -o and -oc, below.) The main output file is named tomtom.html and can be viewed with an internet browser. A second file, tomtom.txt, contains a simplified, text-only version of the output. (See -text, below, for the text output format.) For each query-target match, two additional files containing LOGO alignments are also written--an encapsulated postscript file (.eps) and a PNG file (.png). If the convert program is not available, no PNG files will be written.

Only matches for which the q-value is greater then threshold set by the --q-thresh (default of 0.5) will be shown. The q-value is the estimated false discovery rate if the occurrence is accepted as significant. See Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445

Options:

Example format strings for target-url and query-url.

Bugs: none known.

Authors: Shobhit Gupta (shobhitg@u.washington.edu), Timothy Bailey (tbailey@imb.uq.edu.au), Charles E. Grant (cegrant@gs.washington.edu) and William Noble (noble@gs.washington.edu).