How to compare microarray data grouped according to two variables
Go to parent Analyze your own microarray data in R/Bioconductor
As an example we will compare 4 groups of plants:
1. healthy control plants
2. plants infected with a pathogen
3. plants inhabited by beneficial microorganisms
4. plants inhabited by beneficial microorganisms and infected with a pathogen
It means that we have two grouping variables: pathogen and beneficial MO. All four combinations of pathogen and beneficial MO are observed, so this is a two-factor design.
It is especially important with a multi-factor design to decide what are the comparisons of interest. We want to know which genes respond to:
- which genes respond to the pathogen infection in healthy plants: compare 2 and 1
- which genes respond to the benifical MO in healthy plants: compare 3 and 1
- which genes respond to the pathogen infection in beneficial MO inhabited plants: compare 4 and 3
How to compare four groups defined by two grouping variables ? |
---|
The easiest way to analyze these data uses limma in a similar way to that in which one-factor designs were analyzed. The most basic approach is to fit a model with one factor representing the four groups and then to extract the comparisons of interest as contrasts:
ph@data[ ,2] = c("con","con","con","inh","inh","inh","pat","pat","pat","patinh","patinh","patinh") colnames(ph@data)[2]="Treatment" groups = ph@data$Treatment f = factor(groups,levels=c("con","inh","pat","patinh")) design = model.matrix(~ 0 + f) colnames(design) = levels(f) data.fit = lmFit(data.rma, design) This calculates the mean log expression in each group. The contrasts of interest can be extracted by creating a contrast matrix: cont.matrix = makeContrasts(inhvscon=inh-con,patvscon=pat-con,patinhvsinh=patinh-inh,levels=design) data.contr = contrasts.fit(data.fit,cont.matrix) data.fit.eb = eBayes(data.contr) |
This approach for analyzing two-factor deigns is recommended for most users, because the contrasts that are being tested are defined explicitly. It will work for simple comparisons.
However, if you want to see if the genes that respond to pathogen infection are different in BMO inhabited and control plants, you need to use an interaction term in the design matrix. To do this, you use a model formula.
Model formulas in R offer lots of extras for setting up the design matrix. However, they also require a high level of statistical understanding in order to use reliably, and they are not well described in the main R documentation.
How to compare four groups defined by two grouping variables using an interaction term ? |
---|
ph@data[ ,2] = c("noPat","notPat","noPat","Pat","Pat","Pat","noPat","noPat","noPat","Pat","Pat","Pat") colnames(ph@data)[2]="Pathogen" ph@data[ ,3] = c("noTr","noTr","noTr","noTr","noTr","noTr","Tr","Tr","Tr","Tr","Tr","Tr") colnames(ph@data)[3]="Treatment" In this example, results were messy when I did not choose the names of the tratments in alphabetical order. So always try to do this to be sure of the outcome of the fit. groupsP = ph@data$Pathogen groupsT = ph@data$Treatment ft = factor(groupsT,levels=c("noTr","Tr")) fp = factor(groupsP,levels=c("noPat","Pat")) Then you need to create a design matrix, a matrix of values of the grouping variable. ANOVA needs such a matrix to know which samples belong to which group. Since limma performs a two-way ANOVA, it needs such a design matrix. You can create it using the model.matrix() method. factored.design = model.matrix(~fp*ft) data.fit = lmFit(data.rma,factored.design) Lmfit() will fit a linear model to the data. It will create a data frame called data.fit containing 4 columns:
If you want to do the other comparisons e.g. pathogen infected BMO inhabited plants versus healthy BMO plants, you have to define a contrast matrix. Performing the two-factor ANOVA is now done using eBayes(). data.fit.eb = eBayes(data.fit) data.fit.eb$p.value[,1:20] |
The two approaches considered are equivalent and should yield identical results for corresponding comparisons.
More info on creating design matrices can be found in the limma user guide.