How to compare raw and background-corrected microarray data
Go to parent Analyze your own microarray data in R/Bioconductor
Comparing raw and background-corrected data after RMA
How to create a plot of raw versus background corrected intensities using ggplot ? |
---|
We cannot use the output of the rma() method since the rma() method performs three additional steps on the data (log transformation, quantile normalization and probe normalization). The bg.correct() method allows us to perform only a background correction, the method argument specifies which method is to be used for background correction. We want the one that is used by RMA and we want to apply it only on the PM intensities since RMA only uses the PM intensities. bgcorr = pm(bg.correct(data,method="rma")) First we need to get the data in the correct format for ggplot: a data frame with:
We will work with PM intensities only pmexp = pm(data) We create three empty vectors that will serve as the three columns of the data frame:
The dim() method returns the number of rows (probes) and columns (samples) of the matrix containing the PM intensities. The number of rows corresponds to the number of PM intensities (probes) that is available for each sample (in this example that's 251078). The sampleNames vector needs the same number of rows consisting of sample names. The logs and the corrlogs vector will contain the log PM intensities of all six samples stacked into a single column. Up to this point we have extracted raw PM intensities, they are not yet log-transformed! You can do the log transformation using the log2() method. sampleNames = vector() logs = vector() corrlogs = vector() for (i in 1:6) { sampleNames = c(sampleNames,rep(ph@data[i,1],dim(pmexp)[1])) logs = c(logs,log2(pmexp[,i])) corrlogs = c(corrlogs,log2(bgcorr[,i])) } If you have 3 groups of 3 replicates, the code is as follows: sampleNames = vector() logs = vector() corrlogs = vector() for (i in 1:9) { sampleNames = c(sampleNames,rep(ph@data[i,1],dim(pmexp)[1])) logs = c(logs,log2(pmexp[,i])) corrlogs = c(corrlogs,log2(bgcorr[,i])) } Then we combine sample names and log intensities into one data frame: corrData = data.frame(logInt=logs,bgcorr_logInt=corrlogs,sampleName=sampleNames) Now we can create the plot. We use the geom_abline() method to add a red diagonal to the plot: dataScatter = ggplot(corrData, aes(logInt,bgcorr_logInt)) dataScatter + geom_point() + geom_abline(intercept=0,slope=1,colour='red') + facet_grid(.~sampleName) |
If there is no difference between raw and background corrected data, the data points should end up on the diagonal. This is of course not the case. Only low intensities are strongly affected by the background subtraction: this is normal since the background intensity is a small value. Subtracting a small value from a small value has a much bigger impact than subtracting a small value from a high value.
Comparing raw and background corrected data after GCRMA
How to create a plot of raw versus background corrected intensities after GCRMA ? |
---|
We cannot use the output of the gcrma() method since the gcrma() method performs three additional steps on the data (log transformation, quantile normalization and probe normalization). The bg.adjust.gcrma() method allows us to perform only a gcrma background correction. bgcorr = bg.adjust.gcrma(data) The rest of the plot is created in exactly the same manner as described above for the RMA method. |