Normalisation / Affymetrix

Description

Preprocesses Affymetrix CEL-files to give expression estimates and call values for genes.

Parameters

Details

Six normalization methods are available. Regardless of method, the normalized data will always be expressed in base 2 log-transformed scale.

RMA (Irizarry et al. 2003), GCRMA and Li-Wong (Li and Wong, 2001) give gene expression estimates that have some nice statistical properties, making the data more closely resemble a normal distribution and improving on the homogeneity of variances. Compared to MAS5 they have been shown to result in greater statistical power in downstream analyses. Note that the parameter setting for variance stabilization does not have an effect here, since it's an inherent part of these methods.

RPA is a probe-level preprocessing method for Affymetrix arrays (Lahti et al., TCBB/IEEE 2011). The model includes the popular RMA algorithm as a special case. RPA compares favourably to RMA and other popular preprocessing algorithms at the AffyCompII benchmarking site.

MAS5 (Affymetrix, 2001) is the method originally implemented by Affymetrix in their GCOS software. Plier is a more recent method developed by Affymetrix that results in more precise estimates of lower intensity probes than MAS5. Variance stabilizing normalization can be combined with these methods to further improve the precision of low intensity expression estimates.

Call values (present, absent, marginal) are always calculated using the MAS5 algorithm regardless of the selected preprocessing algorithm.

It is a known problem with Affymetrix expression arrays that a sizeable part of the probes have been mis-annotated to wrong genes. The probes have been re-annotated according to the current knowledge to 'correct' genes in alternative CDF environments. If the user wants to preprocess the Affymetrix chips using these reannotations, then one of the custom chiptypes needs to be selected. The reannotations are from University of Michigan.

Output

A tab-delimited text file containing gene names, expression estimates and call values ("flags"). This file is suitable for all further analyses.

References

Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA, version 5 edition, 2001.

Rafael A. Irizarry, Bridget Hobbs, Francois Collin, Yasmin D. Beazer-Barclay, Kristen J. Antonellis, Uwe Scherf, and Terence P. Speed. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 2003. Apr;4(2):249-64.

C. Li and W.H. Wong. Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science U S A, 98:3136, 2001.

L. Lahti, L.L. Elo, T. Aittokallio, and S. Kaski. Probabilistic Analysis of Probe Reliability in Differential Gene Expression Studies with Short Oligonucleotide Arrays. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(1):217-225, 2011 (preprint PDF; arXiv).