[ Overviews | Main_Page ]

Biologically interpreting a list of genes, obtained with any method, is the major aim of a gene set analysis, or also called gene set enrichment analysis. As an alternative by sifting through the list manually, with this method the researcher looks for the overrepresentation of a set of genes. The genes in a such set can share any property of biological significance, such as belonging to a certain gene ontology category, or being part of a chromosome, or... It is up to the researcher to define meaningful sets which he/she can test to his/her list.

Three major methods exist to test overrepresented sets.

Contingency tables

Properties

Requires a list of gene ids, and a background to which to test the set of genes (e.g. all genes of that species). Typically, the gene id list is a result of a test, on which a predefined cut-off is applied (e.g. alpha of 0.05). Broadly speaking, a contingency table method compares the fraction of a gene list that belongs to a gene set (e.g. 10 lipid catabolism genes in the gene list of 250 genes), with the fraction of this gene set in the background (e.g. the genome contains 21 000 genes of which 560 are lipid catabolism genes). Different stastical tests, e.g. Fisher's Extact test, can compare these numbers and output a statistical result.

Tools

Webinterfaces

DAVID
Enrichr
GOstat
GraphiteWeb
GOEAST - for results of microarrays only
Batch Genes
ToppGene
gProfiler
GeneBrowser2
Babelomics
Network2Canvas - very visual
Ingenuity Pathway Analysis - commercial

Code

GOFunction - R package

Network view

BiNGO - Cytoscape plugin
ClueGO - Cytoscape plugin, integrates Gene Ontology (GO) terms as well as KEGG/BioCarta pathways

Issues

This method depends on a predefined threshold, for generating the 'significantly differing' gene list on which to perform the analysis. This requirement for an arbitrarily user-defined threshold (for example, genes with p-value >0.01 are called not-significant) causes an loss of information and influences the final result of the contingency table based gene set analysis.

To demonstrate this, we uploaded a dataset in Ingenuity Pathway analysis. From this set we extracted 7 different gene lists, corresponding to genes with a p-value < 10^-8, one list with < 10^-7, etc. up to genes with a p-value < 10^-2. Correspondingly, the gene list differed in size, ranging from 126 ids (10^-8 cut-off list) to ~1600 ids (10^-2 cut-off list). The dataset can be downloaded from the BITS website (dyslipidemia microarray data set).

IPA can detect differential regulation of pathways: when comparing the 7 different gene lists, the subsequent inclusion of genes to the list caused differing results of regulated pathways. The screenshots below show this different behaviour by subsequently sorting the pathways by p-values in the different lists.

There is no good solution of this problem. Comparing different thresholds as done above can however shed light on the impact of the arbitrarily chosen cut-off for generating the gene list.

Using raw expression data

Goeman's global test
Hotelling's T²
ANCOVA
MANOVA

Using gene-level statistics

Properties

These methods take as input all genes measured with their associate statistic, or significance result (e.g. p-value) or expression values. In this way, this method avoid having to set a predefined cut-off, as is the case with contingency tables (see above).

Tools

GSEA
GAGE - detection of sets with up- and down-regulated genes simultaneously
PIANO - a metapackage in R that combines many different methods into consensus results - top!
GSVA - Bioconductor package
PGSEA - Bioconductor package
SeqGSEA - Bioconductor, with features for RNA-seq data
PathwayProcessor - webinterface, determine differentially regulated pathways

Pathway knowledge databases

You can fetch gene sets from different pathway databases.

References

Väremo et al - http://nar.oxfordjournals.org/content/early/2013/02/26/nar.gkt111
Khatri et al. - Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges - http://dx.doi.org/10.1371/journal.pcbi.1002375
Altman et al. - A systematic comparison of the MetaCyc and KEGG pathway databases - http://www.biomedcentral.com/1471-2105/14/112/abstract
Getting genetics done blog

Gene set enrichment analysis

Contents

Contingency tables

Properties

Tools

Webinterfaces

Code

Network view

Issues

Using raw expression data

Using gene-level statistics

Properties

Tools

Pathway knowledge databases

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox