RNA-seq / Differential expression analysis using DESeq
Description
This tool performs differential expression analysis using the DESeq Bioconductor package.
Parameters
- Column describing groups [group]
- Apply normalization (yes, no) [yes]
- Disregard replicates (yes, no) [no]
- Use fitted dispersion values (when higher than original values, always) [when higher than original values]
- Dispersion estimate (parametric, local) [local]
- Multiple testing correction (none, Bonferroni, Holm, Hochberg, BH, BY) [BH]
- P-value cutoff (0-1) [0.05]
- Plot width (200-3200 [600]
- Plot height (200-3200) [600]
Details
Given an input table of raw counts, the DESeq package performs statistical analysis to identify differentially expressed genes or other genomic features between two experimental conditions.
Note that in its current implementation, the tool only supports single-factor experiment designs. The experiment
conditions to be compared should be defined in the phenodata.tsv file and the appropriate column selected using
the 'Column describing group' parameter.
When normalization is enabled, size factors are calculated by summing the counts for each sample, or
using the library size given by the user in the phenodata.tsv. The former allows to correct for RNA composition bias
(which can arise for example when only a small number of genes are very highly expressed in one experiment condition but not in the other).
A dispersion value is estimated for each gene through a model fit procedure, which can be performed in a "local" or "parametric" mode.
The former is more robust, but users are encouraged to experiment with the setting to optimize results.
Users can select to replace the original dispersion values by the fitted ones always, or only when the fitted value is higher than the original one
(more conservative option).
It is highly recommended to always have at least two biological replicates for each condition.
If this is not possible, you can run the analysis using replicates
for only one condition, or by estimating variability using samples of the two different conditions.
Statistical testing is performed using a negative binomial test.
Output
The analysis output consists of the following files:
- de-list-deseq.tsv: Table containing the results of the statistical testing, including fold change estimates and p-values.
- de-list-deseq.bed: The BED version of the results table contains genomic coordinates and log2 fold change values.
- ma-plot-significant-deseq.pdf: A scatter plot where the significantly differentially expressed features are highlighted.
- dispersion-plot.pdf: A plot that displays the dispersion estimates as a function of the counts values, with the fitted model overlaid.
- p-value-plot-edger.pdf: Plot of the raw and adjusted p-value distributions of the statistical test.
References
This tool uses the DESeq package for statistical analysis. Please read the following article for more detailed information:
S Anders and W H. Differential expression analysis for sequence count data. Genome Biology 2010, 11:R106.