Introduction to R/Bioconductor for analysis of microarray data
Contents
Summary
This training should enable you to
- Install R and Bioconductor from their respective websites
- Read gene expression microarray data and phenotypic information into R
- Perform simple quality control on expression and phenotype data
- Filter probesets for quality and information content
- Identify differentially expressed probesets using linear models and permutation tests
- Export annotated gene lists to HTML and CSV files
- Visualize and cluster expression data using heatmaps
- Combine the previous steps to form simple workflows
- Use the R help system and the R and Bioconductor websites to find information
Required software and libraries
In order to work through this training as you did during the session, you will need [R] (install from here) and Bioconductor (install from here) installed on your computer as well as a number of optional libraries.
To install most of the required optional files on your machine, you can save this linked file and execute it as a R script (source it) from within your R console. (RBiocOct2010_install.R). The instructor will show you these basics steps. Afterwards, to test your installation, you can also 'source' this second script which should generate a number of images and text output if all went right (RBiocOct2010_test.R).
Training Units
We have broken down the training into different modules covering specific aspects of working with R/Bioconductor and analyzing expression data. Each unit comes with a tutorial, which contains step-by-step instructions for performing the tasks, annotated with useful background information. The commands shown in these tutorials provide the information for doing the actual exercises, which are of course the core of the training.
Day 1
Module 1 Handling R
Basic usage of the R statistical software environment: command line, menus, help, loading and installing add-on packages, generating graphs, reading and storing data.
Module 2 Reading and storing expression data
Reading raw and processed expression data into R, plus accompanying phenotypic and annotation data.
Module 3 Basic pre-processing
Demonstrating some methods and their properties for turning raw probe-level data into normalized probeset-level data.
Module 4 Testing for differential expression I
Testing for differential expression between two or more groups; comments on multiple testing adjustment.
Module 5 Annotating gene lists
Adding extra information to gene lists
Module 6 Organizing workflows I
Combining the previous steps into simple workflows for replicable data analysis
Day 2
Module 7 Quality control
Module 8 Differential expression for more than two groups
Module 9 Clustering and heatmaps
Software installation
Both R and the Bioconductor package bundle are open source software (mostly GPL and LGPL) that can be easily installed both technically and legally.
Instructions for installing R and Bioconductor
Glossary
- Terms that seem important
- could be defined here