How to create histograms of microarray data
Go to parent Analyze your own microarray data in R/Bioconductor
The code on this page works for both affy and oligo
How to plot the histograms of each array ? |
---|
Histograms are created via the hist() method which can be applied directly on an AffyBatch/FeatureSet object.
for (i in 1:6) { name = paste("histogram",i,".jpg",sep="") jpeg(name) hist(data[,i],lwd=2,which='pm',ylab='Density',xlab='Log2 intensities',main=ph@data$sample[i]) dev.off() } The which argument allows you to specify if you want to use:
Since the use of MM probes is highly controversial, we will work with perfect match probes only. The lwd argument defines the line width, xlab and ylab define the titles of the axes and main defines the title of the histogram. |
You might want to fit all histograms on the same figure:
The code for this is as follows:
op = par(mfrow = c(2,3)) for(i in 1:6){hist(data[,i],lwd=2,which='pm',ylab='Density',xlab='Log2 intensities',main=ph@data$sample[i])}
You can further play with the appearance of the plot.
How to create one plot containing the histograms of all samples, wt and mutant replicates in a different color ? |
---|
You are going to crate one histogram. You can put each replicate in a different color using the col argument:
color=c('green','green','green','red','red','red') hist(data[,1:6],lwd=2,which='pm',col=color,ylab='Density',xlab='Log2 intensities',main='Histogram of raw data') |
Although the hist() method is very useful and easy to use, you might want to generate the histogram with ggplot(), since this gives you more freedom to make more changes to the plot. Check out the ggplot documentation.
How to create a plot containing the histograms of all samples (each sample in a different color) using ggplot ? |
---|
First we need to get the data in the correct format for ggplot: a data frame with log intensities in one column and sample names in the second column.
We'll only use the PM intensities pmexp = pm(data) We create two empty vectors that will serve as the two columns of the data frame:
The dim() method returns the number of rows (probes) and columns (samples) of the matrix containing the PM intensities. The number of rows corresponds to the number of PM intensities (probes) that is available for each sample (in this example that's 251078). The sampleNames vector needs the same number of rows consisting of sample names. The logs vector will contain the log PM intensities of all six samples stacked into a single column. Up to this point we have extracted raw PM intensities, they are not yet log-transformed! You can do the log transformation using the log2() method. sampleNames = vector() logs = vector() for (i in 1:6) { sampleNames = c(sampleNames,rep(ph@data[i,1],dim(pmexp)[1])) logs = c(logs,log2(pmexp[,i])) } If you have 3 groups of 3 replicates, the code is as follows: sampleNames = vector() logs = vector() for (i in 1:9) { sampleNames = c(sampleNames,rep(ph@data[i,1],dim(pmexp)[1])) logs = c(logs,log2(pmexp[,i])) } After we have filled the vectors with the data we need, we combine sample names and log intensities into a single data frame: logData = data.frame(logInt=logs,sampleName=sampleNames) Now we can create the plot: dataHist2 = ggplot(logData, aes(logInt, colour = sampleName)) dataHist2 + geom_density() |