Exercises: Basic pre-processing
Go to parent Introduction to R/Bioconductor for analysis of microarray data#Training Units
Atxn1, different expression measures
Consider the data from experiment E-MEXP-886 that you have read into R during the previous exercise.
- Load the object with the raw expression data into R.
- Generate an object with the corresponding probeset-level intensity values via the RMA method.
- The same as above, but using MAS5
- Produce an MA-plot for the RMA expression values for the first slide.
- Produce the corresponding MA-plot for the MAS5 values. Compare.
- Produce a scatter plot of RMA vs MAS5 expression values
Bonus: The updated Affymetrix standard for cacluating probeset-level expression values, called plier, is implemented in the Bioconductor package plier
. Install the package, calculate the plier expression values, and compare with RMA as above.
Plier artefacts?
Plier has been critized in the literature for introducing strange artefacts in some situations. The Bioconductor package xps
contains a vignette showing a (graphical) example for this kind of criticism. Find the vignette and study the example. Do you find the example and argumentation convincing?
Combining different data sets
One potential problem with measures like RMA that combine both probe- and array information to calculate an expression level is that when new data (chips) become avaiable, the whole calculation process must be repeated. The expression values for the old data must also be re-calculated, leading to generally negligible, but potentially worrisome changes in expression levels.
Explore the potential magnitude of the problem by taking any of the existing AffyBatch
objects and splitting them into two new batches of similar size. Compute the RMA expression values for the two smaller batches as well as for the complete data.
- Pick one chip from each of the smaller batches, and plot its RMA values against the corresponding values from the large batch. What do you see? Is this a problem?
- For both small batches, calcuate the difference between the RMA values in the small batch and corresponding chips in the large batch, and draw boxplots of the differences. Hint: if the chips in
sbatch1
correspond to the first five chips inLbatch
, something like
boxplot(as.data.frame(exprs(sbatch1) - exprs(Lbatch)[ ,1:5]))
will do the trick.