How to compare microarray data grouped according to two variables

Go to parent Analyze your own microarray data in R/Bioconductor

As an example we will compare 4 groups of plants:
1. healthy control plants
2. plants infected with a pathogen
3. plants inhabited by beneficial microorganisms
4. plants inhabited by beneficial microorganisms and infected with a pathogen
It means that we have two grouping variables: pathogen and beneficial MO. All four combinations of pathogen and beneficial MO are observed, so this is a two-factor design.

It is especially important with a multi-factor design to decide what are the comparisons of interest. We want to know which genes respond to:

which genes respond to the pathogen infection in healthy plants: compare 2 and 1
which genes respond to the benifical MO in healthy plants: compare 3 and 1
which genes respond to the pathogen infection in beneficial MO inhabited plants: compare 4 and 3

How to compare four groups defined by two grouping variables ?

The easiest way to analyze these data uses limma in a similar way to that in which one-factor designs were analyzed. The most basic approach is to fit a model with one factor representing the four groups and then to extract the comparisons of interest as contrasts:

ph@data[ ,2] = c("con","con","con","inh","inh","inh","pat","pat","pat","patinh","patinh","patinh")
colnames(ph@data)[2]="Treatment"
groups = ph@data$Treatment
f = factor(groups,levels=c("con","inh","pat","patinh"))
design = model.matrix(~ 0 + f)
colnames(design) = levels(f)
data.fit = lmFit(data.rma, design)

This calculates the mean log expression in each group. The contrasts of interest can be extracted by creating a contrast matrix:

cont.matrix = makeContrasts(inhvscon=inh-con,patvscon=pat-con,patinhvsinh=patinh-inh,levels=design)
data.contr = contrasts.fit(data.fit,cont.matrix)
data.fit.eb = eBayes(data.contr)

This approach for analyzing two-factor deigns is recommended for most users, because the contrasts that are being tested are defined explicitly. It will work for simple comparisons.

However, if you want to see if the genes that respond to pathogen infection are different in BMO inhabited and control plants, you need to use an interaction term in the design matrix. To do this, you use a model formula.
Model formulas in R offer lots of extras for setting up the design matrix. However, they also require a high level of statistical understanding in order to use reliably, and they are not well described in the main R documentation.

How to compare four groups defined by two grouping variables using an interaction term ?
ph@data[ ,2] = c("noPat","notPat","noPat","Pat","Pat","Pat","noPat","noPat","noPat","Pat","Pat","Pat") colnames(ph@data)[2]="Pathogen" ph@data[ ,3] = c("noTr","noTr","noTr","noTr","noTr","noTr","Tr","Tr","Tr","Tr","Tr","Tr") colnames(ph@data)[3]="Treatment" In this example, results were messy when I did not choose the names of the tratments in alphabetical order. So always try to do this to be sure of the outcome of the fit. Of course, now you need to factorize both grouping variables: groupsP = ph@data$Pathogen groupsT = ph@data$Treatment ft = factor(groupsT,levels=c("noTr","Tr")) fp = factor(groupsP,levels=c("noPat","Pat")) Then you need to create a design matrix, a matrix of values of the grouping variable. ANOVA needs such a matrix to know which samples belong to which group. Since limma performs a two-way ANOVA, it needs such a design matrix. You can create it using the model.matrix() method. The argument of the model.matrix method is a model formula. The tilde (~) in the argument specifies the right hand side of a model equation. An interaction is defined by an asterisk () factored.design = model.matrix(~fpft) data.fit = lmFit(data.rma,factored.design) Lmfit() will fit a linear model to the data. It will create a data frame called data.fit containing 4 columns: the first column contains the intercept of the linear model: it's the mean log expression in control plants the second column compares pathogen infected plants with control plants the third column compares BMO inhabited plants with control plants the fourth column contains the interaction showing you if the genes that respond to the pathogen are influenced by the BMO If you want to do the other comparisons e.g. pathogen infected BMO inhabited plants versus healthy BMO plants, you have to define a contrast matrix. Performing the two-factor ANOVA is now done using eBayes(). data.fit.eb = eBayes(data.fit) data.fit.eb$p.value[,1:20]

The two approaches considered are equivalent and should yield identical results for corresponding comparisons.

More info on creating design matrices can be found in the limma user guide.

How to compare microarray data grouped according to two variables

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox