Usage:
ama [options] <motif file> <sequence file> <background file>
Description:
The name AMA
stands for "Average Motif Affinity".
The program scores a set of DNA sequences given a DNA-binding motif,
treating each position in the sequence as a possible
binding event. The score is calculated by averaging the
likelihood ratio scores for all feasible binding events to
the given sequence and to its reverse strand.
The binding strength at each potential site is defined as
the likelihood ratio of the site under the motif versus under
a zero-order background model provided by the user.
By default, AMA
reports the average motif affinity score
and the p-value of that score for each sequence in its input.
P-values are estimated analytically using the given
zero-order background model.
AMA can instead be made to report the maximum of the likelihood ratio score, or the z-score of either the average or maximum likelihood score, using the options below.
If the input file contains more than one motif, the motifs will be processed consecutively.
Input:
-
<motif file>
containing a list of motifs, in MEME format. -
<sequence file>
is a collection of sequences in FASTA format. -
<background file>
is a 0-order Markov model in background model format such as produced byfasta-get-markov
.
Output:
AMA
writes to standard out. The output format is either
gff or
CisML.
Options:
--motif <id>
- Use only the motif identified by<id>
. This option may be repeated.--motif-pseudo <float>
- A pseudocount to be added to each count in the motif matrix, after first multiplying by the corresponding background frequency (default=0.1).--norc
- Do not score the reverse complement DNA strand. Both strands are scored by default.--scoring avg-odds|max-odds
- Indicates whether the average or the maximum likelihood ratio (odds) score should be calculated (default avg-odds). If max-odds is chosen, no p-value will be printed.--z-scoring <n>
- Report the z-score of the underlying score instead of the score. The z-score for a sequence is computed by shuffling the sequence the given number of times (n
) to estimate the mean and standard deviation of the underlying score. No p-value will be printed.--o-format gff|cisml
- Output file format (default cisml).--nostatus
- Suppresses the process information.--verbosity 1|2|3|4
- Set the verbosity of status reports to standard error. The default level is 2.--max-seq-length
- Set the maximum length allowed for input sequences. By default the maximum allowed length is 250000000.<max>