Tutorial: Annotating gene lists

From BITS wiki
Jump to: navigation, search
Go to parent Introduction to R/Bioconductor for analysis of microarray data#Tutorials


Generally, the gene annotation data and the actual expression data are kept fairly separate during the initial analysis - it is enough to carry a unique identifier along. However, once we have a list of canidate genes, we are very interested in interpreting our results, and want to link to the identifiers to actual biological information, and preferably also against online repositories.

This can be done the hard way, using the annotation packages like hgu95av2.db and the functions get and revmap, Lkeys and Rkeys. The package annotate describes the process in a series of vignettes. It probably does one good to do it, but it's not exactly for the faint of heart.

Fortunately, we have a very convenient alternative through the package annaffy, which offers a series of handlers, which allow access to basic annotation information like Entrez ID, GO Ontology and KEGG pathway:

> library(annaffy)
> aaf.handler()
 [1] "Probe"               "Symbol"              "Description"        
 [4] "Chromosome"          "Chromosome Location" "GenBank"            
 [7] "Gene"                "Cytoband"            "UniGene"            
[10] "PubMed"              "Gene Ontology"       "Pathway"   

For our estrogen example, we use a subset of these handlers:

> annCols = aaf.handler()[c(1:3, 7:9, 12)]
> annCols
[1] "Probe"       "Symbol"      "Description" "Gene"        "Cytoband"   
[6] "UniGene"     "Pathway"

We now build a basic table of annotation information for the unique Affymetrix probeset IDs of our candidate genes:

annTab  = aafTableAnn(estroTop$ID, "hgu95av2.db", annCols)

Note that this table is for export purposes only - displayed at the command line it is not very helpful (try it).

Once we have the annotation information in place, we generate a table from the evidence we have for the data, e.g. the t-statistics and p-values, and merge the data table and the annotation table:

estroAnnTab  = aafTable(logFC=round(estroTop$logFC, 2), tstat=round(estroTop$t, 1), adjPval=signif(estroTop$adj.P.Val, 1))
estroAnnTab = merge(estroAnnTab, annTab)

This table can now easily be exported to e.g. a HTML file:

saveHTML(estroAnnTab, "Estrogen_Pres-Abs.html", title = "Estrogen Present-Absent")