How to compare microarray data grouped according to two variables

From BITS wiki
Jump to: navigation, search
Go to parent Analyze your own microarray data in R/Bioconductor

As an example we will compare 4 groups of plants:
1. healthy control plants
2. plants infected with a pathogen
3. plants inhabited by beneficial microorganisms
4. plants inhabited by beneficial microorganisms and infected with a pathogen
It means that we have two grouping variables: pathogen and beneficial MO. All four combinations of pathogen and beneficial MO are observed, so this is a two-factor design.

It is especially important with a multi-factor design to decide what are the comparisons of interest. We want to know which genes respond to:

  • which genes respond to the pathogen infection in healthy plants: compare 2 and 1
  • which genes respond to the benifical MO in healthy plants: compare 3 and 1
  • which genes respond to the pathogen infection in beneficial MO inhabited plants: compare 4 and 3

This approach for analyzing two-factor deigns is recommended for most users, because the contrasts that are being tested are defined explicitly. It will work for simple comparisons.

However, if you want to see if the genes that respond to pathogen infection are different in BMO inhabited and control plants, you need to use an interaction term in the design matrix. To do this, you use a model formula.
Model formulas in R offer lots of extras for setting up the design matrix. However, they also require a high level of statistical understanding in order to use reliably, and they are not well described in the main R documentation.

The two approaches considered are equivalent and should yield identical results for corresponding comparisons.

More info on creating design matrices can be found in the limma user guide.