Using GeneVestigator to select candidate reference genes

From BITS wiki
Jump to: navigation, search
Go to parent Exercises on using qbase+

Since normalization of qPCR data is based on the assumption that the reference targets have the same expression level in all samples it is crucial that the expression of the chosen reference genes is stable.

However, none of the so-called housekeeping genes is universally stably expressed.

Genevestigator, both the commercial and the free version, contains a tool, called RefGenes, that allows to identify candidate reference genes that display very stable expression in the context that you are working in, typically a certain tissue of a certain organism.

Genevestigator is a platform that contains curated public microarray data from thousands of experiments/conditions.

RefGenes allows you to select the conditions that are relevant for you, e.g. mouse liver, human fibroblasts, or Arabidopsis thaliana leaves. In a next step, RefGenes identifies the genes with the most stable expression in the selected conditions.

Starting the RefGenes tool

The Genevestigator user interface

The Genevestigator consists of the following components:

  • Sample Selection panel: to choose the experimental conditions you're interested in (green)
  • Gene Selection panel: to choose the genes you're interested in (blue)
  • Center panel shows an overview of all available tools (purple). Once you have selected a tool, the panel will show the results of the analysis that is done by the tool.
  • Home button (red) allows to return to the overview of the tools at any time. The text next to the home button indicates the toolset that you have selected.


Click the RefGenes tool at the bottom.

Using the RefGenes tool to find reference genes

STEP 1: Choose samples from a biological context similar to those in your qPCR expriment

When you select samples for use in the RefGenes tool, you have to focus on microarrays from samples that were collected in conditions similar to those in your qPCR experiment.

Don't make a too general selection, e.g. all human samples: you might end up with genes that are stable in most conditions but not in yours.

Don't make a very specific selection either, e.g. human heart samples from patients taking the same medication as yours. If you want to broaden your study later on with samples from other patients, your reference genes might not be valid anymore.

It is recommended to select reference genes in the same organism and the same / a similar tissue type as the one that you used in your experiments.

STEP 2: Select the gene(s) you want to measure in your qPCR experiment

This step is not essential, but it helps you to see whether your target gene(s) is (are) strongly or weakly expressed in the conditions of interest selected in STEP1. This allows you to search for candidate reference genes in a similar range of expression.

STEP 3: Find candidate reference genes

The reference genes that are suggested by GeneVestigator have the following characteristics:

  • They have the most stable expression levels across all selected samples (a small boxplot)
  • Their overall expression level is similar to that of the target gene(s) of your qPCR experiment


Finding candidate reference genes in the free version of Genevestigator

Now we will make a more elaborate exercise on finding candidate reference genes. We will do the analysis in the free version of RefGenes but the analysis in the commercial version is very similar.

Suppose we want to compare the expression stability of the 4 commonly used reference genes for qPCR on mouse liver samples (ACTB, GAPDH, HPRT and TUBB4B) to that of 4 reference genes that are suggested by Genevestigator.

To this end we open the RefGenes tool and select the liver samples of the mouse 430_2 arrays.

Often there are multiple probe sets for the same gene. When you use the free version you may only choose one probe set per gene so you have to make a choice. How to make that choice ?
Affymetrix probe set IDs have a certain meaning: what comes after the underscore tells you something about the quality of the probes:

  • _at means that all the probes of the probe set hit one known transcript. This is what you want: probes specifically targeting one transcript of one gene
  • _a_at means that all the probes in the probe set hit alternate transcripts from the same gene. This is still ok the probes bind to multiple transcripts but at least the transcripts come from the same gene (splice variants)
  • _x_at means that some of the probes hit transcripts from different genes. This is still not what you want: the expression level is based on a combination of signals of all the probes in a probe set so also probes that cross-hybridize
  • _s_at means that all the probes in the probe set hit transcripts from different genes. This is definitely not what you want: if the probes bind to multiple genes you have no idea whose expression you have measured on the array
So I always ignore probe sets with s or x. If you have two specific probe sets for a gene, they should more or less give similar signals. If this is not the case, I base my choice upon the expression level that I expect for that gene based on previous qPCR results.

As you can see, each of these 4 commonly used reference genes has a high expression level. Most genes do not have such high expression levels. In most qPCR experiments your genes of interest will have low or medium expression levels, so these reference genes will not be representative for the genes of interest.

Reference genes should ideally have similar expression levels as the genes of interest. Therefore, we will select the four most stably expressed genes with a medium expression level (between 8 and 12) according to the RefGenes tool.

Then, we performed qPCR on a representative set of 16 of our liver samples to measure the expression of these 8 candidate reference genes and analyzed the data (See how to select the best reference genes using geNorm in qbase+).

Finding candidate reference genes in the commercial version of Genevestigator

We will do the same exercise as above in the commercial version of Genevestigator. The difference between the free and commercial version of RefGenes is the number of target genes you can select. In the free version you have to select one gene and then gradually add all other genes one at a time. The commercial version allows you to load as many target genes as you want simultaneously. As a consequence, you can select multiple probe sets for the same gene.

All VIB scientists have free access to the commercial version of Genevestigator via their VIB email address. If you don't know your VIB email address, check the Who's Who of VIB.

  • Open a browser and go to the Genevestigator website
  • If it's your first time to access Genevestigator, create an account by clicking join now button. You will be redirected to a new window in which you will give some personal information including a valid VIB email address. Click Register and check your email to activate your new account. Go back to the GeneVestigator website
  • Choose the research field you want to investigate: pharma/biomediacal or plant biology by clicking the corresponding button
  • Click Start
  • Use your VIB email address and password to login to Genevestigator.
  • This will automatically open a Genevestigator startup page in your browser. Keep this page open during the analysis. Closing this page will close Genevestigator.
  • Genevestigator is opened automatically

Open the RefGenes tool by clicking its icon in the Further tools secion and select the liver samples of the mouse 430_2 arrays as explained in the previous exercise.

The next step of selecting the 4 most stable candidate reference genes with medium expression levels is exactly the same as described above for the free version of RefGenes.

Exercise on selecting reference genes for metacaspases in Arabidopsis thaliana.