RNA-seq / Differential expression analysis using DESeq
Description
Differential expression analysis using the exact test of the DESeq Bioconductor package ("nbinomTest").
Please note that this tool is suitable only for two group comparisons.
For multifactor experiments you can use the tool "Differential expression using edgeR for multivariate experiments",
which uses generalized linear models -based statistical methods ("glm edgeR").
Parameters
- Column describing groups [group]
- Apply normalization (yes, no) [yes]
- Dispersion estimation method (parametric, local) [local]
- Use fitted dispersion values (when higher than original values, always) [when higher than original values]
- Multiple testing correction (none, Bonferroni, Holm, Hochberg, BH, BY) [BH]
- P-value cutoff (0-1) [0.05]
- Plot width (200-3200 [600]
- Plot height (200-3200) [600]
Details
This tool takes as input a table of raw counts from the different samples. The count file has to be associated with a phenodata file describing the experimental groups.
These files are best created by the tool "Utilities / Define NGS experiment", which combines count files for different samples to one table, and creates a phenodata file for it.
When normalization is enabled, size factors are calculated by summing the counts for each sample, or
using the library size given by the user in the phenodata.tsv. The former allows to correct for RNA composition bias
(which can arise for example when only a small number of genes are very highly expressed in one experiment condition but not in the other).
A dispersion value is estimated for each gene through a model fit procedure, which can be performed in a "local" or "parametric" mode.
The former is more robust, but users are encouraged to experiment with the setting to optimize results.
Users can select to replace the original dispersion values by the fitted ones always, or only when the fitted value is higher than the original one
(more conservative option).
You need to have biological replicates of each experiment condition in order to estimate dispersion properly.
If you have biological replicates only for one condition, DESeq will estimate dispersion using the replicates of that single
condition. If there are no replicates at all, DESeq will estimate dispersion using the samples from the different conditions as replicates.
Statistical testing is performed using a negative binomial test.
Output
The analysis output consists of the following files:
- de-list-deseq.tsv: Table containing the results of the statistical testing, including fold change estimates and p-values.
- de-list-deseq.bed: The BED version of the results table contains genomic coordinates and log2 fold change values.
- ma-plot-deseq.pdf: A scatter plot where the significantly differentially expressed genes are highlighted.
- dispersion-plot.pdf: Plot of dispersion estimates as a function of the counts values, with the fitted model overlaid.
- p-value-plot-edger.pdf: Plot of the raw and adjusted p-value distributions of the statistical test.
References
This tool uses the DESeq package for statistical analysis. Please read the following article for more detailed information:
S Anders and W Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:R106.