Loading public microarray data in R/Bioconductor

From BITS wiki
Jump to: navigation, search

If you want to load raw CEL files from GEO directly into R, you have to install the GEOquery Bioconductor package and use the following script:

#Installing and loading the GEOquery package
#For the data import you also need the Bioconductor base package and the affy package
library(Biobase)
library(GEOquery)
library(affy)
 
#Downloading and unzipping the CEL-files from GEO. Selection of the data set is based on its GEO Series accession number
getGEOSuppFiles("GSE6943")
untar("GSE6943/GSE6943_RAW.tar", exdir="data")
cels <- list.files("data/", pattern = "[gz]")
sapply(paste("data", cels, sep="/"), gunzip)
cels
 
#Define celpath as the path to the folder where R saved the CEL-files
#in this example, one of the files, GSM160097, has been corrupted, so you have to remove it from the folder
celpath <- "C:/Users/Janick/Documents/data/"
fns <- list.celfiles(path=celpath,full.names=TRUE)
fns
cat("Reading files:\n",paste(fns,collapse="\n"),"\n")
 
#Loading the CEL-files into an AffyBatch object
celfiles <- ReadAffy(celfile.path=celpath)