Promoter analysis / ClusterBuster
Description
Retrieves promoter sequences and finds clusters of putative transcription factor binding sites in them.
Transcription factor binding sites are searched for using the matrices from the
JASPAR library.
Parameters
- Species (human, mouse, rat, drosophila, yeast) [human]
- Promoter size (short, medium, long) [short]
- Cluster score threshold [5]
- Motif score threshold [6]
- Expected distance between motifs in a cluster [35]
- Range for counting nucleotide frequencies [100]
- Pseudocounts [0.375]
Details
This tool retrieves upstream sequences for the specified genes and submits them to the ClusterBuster program. It needs to access the
chip-specific annotations, so if you have not specified the chiptype
during normalization of, e.g., Illumina data, it will not work. RefSeq IDs are used for retrieving
promoter sequences constructed and annotated by UCSC genome browser staff. The same promoter sequences can be downloaded
as a single FastA-formatted file from UCSC Golden Path folder.
User can define how long promoter sequences are used for the analysis:
Human Mouse Rat Drosophila Yeast
Short 1000 bp 1000 bp 1000 bp 1000 bp 500 bp
Medium 2000 bp 2000 bp 2000 bp 2000 bp 1000 bp
Long 5000 bp 5000 bp 5000 bp 5000 bp 2500 bp
Output
A text file detailing the cluster of TFBSs found, and their significance. The cluster are marked in the sequences
with capital letters.
Reference
Frith et al (2003) Nuc Acids Res, 31(13):3666-8