
Usage:
gomo [options] <scoring file> <go-term database>
Description:
The name GOMO
stands for "Gene Ontology for Motifs."
The program searches in a set of ranked genes for enriched GO terms
associated with high ranking genes. The genes can be ranked, for example
by applying a motif scoring algorithms on their upstream sequence.
The E-values for each GO-term are computed and q-values are provided as well,
following the method of Benjamini and Hochberg (where "q-value" is defined
as the minimal false discovery rate at which a given GO-term is
deemed significant).
The program reports all GO terms that receive E-values smaller than a
specified threshold.
Input:
-
<scoring file>
is a XML file which contains for each motif the sequences and their score. The XML file uses the CisML schema. -
<go-term database>
is a collection of GO terms mapped to to the sequences in the XML file. Database are provided by the webservices and are formated using a simple tsv-format:
"GO-term" "GO-term description" "Sequence identifiers separated by tabulator"
Output:
GOMO will create a directory, named gomo_out
by default.
Any existing output files in the directory will be overwritten.
The directory will contain:
-
An XML file named
gomo.xml
providing the results. -
An plain text file named
gomo.txt
providing the results. -
An HTML file named
gomo.html
providing the results.
The default output directory can be overridden using the --o
or --oc
options which are described below.
The --text
option will limit output to plain text sent to the standard output.
Options:
--gs <n>
- Indicates that gene scores contained in the cisml file should be used for the calculations. The gene p-values are used per default.--motif <id>
- Use only the motif identified by<id>
. This option may be repeated.--o <dir name>
- Specifies the output directory. If the directory already exists, the contents will not be overwritten.--oc <dir name>
- Specifies the output directory. If the directory alreay exists, the contents will be overwritten.--score-E-thresh <n>
- Threshold used on the gene score E-values above which all E-values become maximal in order to reduce the impact of noise. Subsequently, this results in all genes having E-values above the threshold to obtain the same rank in the ranksum statistics. The threshold will be ignored when gene scores are used (--gs).--text
Limits output to plain text sent to standard out. For FIMO, the text output is unsorted, and q-values are not reported. This mode allows the program to search an arbitrarily large database, because results are not stored in memory.--nostatus
- Suppresses the process information.--verbosity 1|2|3|4
- Set the verbosity of status reports to standard error. The default level is 2.