NGS ChIP-seq / Find common motifs and match to JASPAR
Description
This tool scans a set of genomic regions for consensus sequence motifs,
calculates the alignment score against transcription factors in the Jaspar database and finds the 10 highest ranking for each motif.
Parameters
- P-value cutoff (0...1) [0.0002]
- E-value cutoff (0...1) [0.01]
- Genome ([BSgenome.Hsapiens.UCSC.hg17, BSgenome.Hsapiens.UCSC.hg18, BSgenome.Hsapiens.UCSC.hg19, BSgenome.Mmusculus.UCSC.mm8, BSgenome.Mmusculus.UCSC.mm9, BSgenome.Rnorvegicus.UCSC.rn4]) [BSgenome.Hsapiens.UCSC.hg18]
Details
For a more thorough description of the technical details please consult the original publications cited in the References section below. Briefly, the analysis proceeds through the following steps:
- - Given a set of genomic regions the analysis algorithm will first perform a de novo motif discovery in an unseeded fasion. The starting set of position weighted matrices (PWM) is obtained through a combination of space dyads and expectation maximization procedures. Further optimization is achieved using genetic algorithms techniques.
- - In the second phase of the analysis. the PWM for known transcription factors collected in the JASPAR database, are matched to the set of consensus motifs discovered in the previous step.
- - Finally, the top ten best matching transcription factors are gathered for each consensus motif, logo plots created and E-values scoring the match strength are calculated.
Output
The analysis output consists of the following:
- motif-analysis-summary.tsv: A test file that summarizes the analysis with information on
- - number of consensus motifs
- - number of matches per motif
- - matches repartition
- - the UPAC nucleotide sequences for the motifs
- - number of occurrences of consensus sequence for each motif
- - the position weighted matrices for each motif
- logo-plot-(...).pdf: For each consensus motif the ten best matching transcription factors from the JASPAR database are represented with a logo plot.
References
This tool uses the following Bioconductor packages:
- MotIV
- rGADEM
- BSgenome annotation packages
For more details refer to these publications:
S. Mahony, P.V. Benos "STAMP: a web tool for exploring DNA-binding motif similarities." Nucl
Acids Res, (2007) 35:W253-258
S Mahony, PE Auron, PV Benos, "DNA familial binding proles made easy: comparison of
various motif alignment and clustering strategies", PLoS Computational Biology (2007) 3(3):e61
L. Leiping. GADEM: A Genetic Algorithm Guided Formation of
Spaced Dyads Coupled with an EM Algorithm for Motif Discovery.
J Comput Biology, (2009) Feb;16(2):317-29.