Introduction to R/Bioconductor for analysis of microarray data

From BITS wiki
Jump to: navigation, search


Summary

This training should enable you to

  1. Install R and Bioconductor from their respective websites
  2. Read gene expression microarray data and phenotypic information into R
  3. Perform simple quality control on expression and phenotype data
  4. Filter probesets for quality and information content
  5. Identify differentially expressed probesets using linear models and permutation tests
  6. Export annotated gene lists to HTML and CSV files
  7. Visualize and cluster expression data using heatmaps
  8. Combine the previous steps to form simple workflows
  9. Use the R help system and the R and Bioconductor websites to find information

Required software and libraries

In order to work through this training as you did during the session, you will need [R] (install from here) and Bioconductor (install from here) installed on your computer as well as a number of optional libraries.

To install most of the required optional files on your machine, you can save this linked file and execute it as a R script (source it) from within your R console. (RBiocOct2010_install.R). The instructor will show you these basics steps. Afterwards, to test your installation, you can also 'source' this second script which should generate a number of images and text output if all went right (RBiocOct2010_test.R).

Training Units

We have broken down the training into different modules covering specific aspects of working with R/Bioconductor and analyzing expression data. Each unit comes with a tutorial, which contains step-by-step instructions for performing the tasks, annotated with useful background information. The commands shown in these tutorials provide the information for doing the actual exercises, which are of course the core of the training.

Day 1

Module 1 Handling R

Basic usage of the R statistical software environment: command line, menus, help, loading and installing add-on packages, generating graphs, reading and storing data.

Module 2 Reading and storing expression data

Reading raw and processed expression data into R, plus accompanying phenotypic and annotation data.

Module 3 Basic pre-processing

Demonstrating some methods and their properties for turning raw probe-level data into normalized probeset-level data.

Module 4 Testing for differential expression I

Testing for differential expression between two or more groups; comments on multiple testing adjustment.

Module 5 Annotating gene lists

Adding extra information to gene lists

Module 6 Organizing workflows I

Combining the previous steps into simple workflows for replicable data analysis

Day 2

Module 7 Quality control

Module 8 Differential expression for more than two groups

Module 9 Clustering and heatmaps

Software installation

Both R and the Bioconductor package bundle are open source software (mostly GPL and LGPL) that can be easily installed both technically and legally.

Instructions for installing R and Bioconductor

Glossary

Terms that seem important
could be defined here