Using GeneVestigator to select candidate reference genes
Go to parent Exercises on using qbase+
Since normalization of qPCR data is based on the assumption that the reference targets have the same expression level in all samples it is crucial that the expression of the chosen reference genes is stable.
However, none of the so-called housekeeping genes is universally stably expressed.
Genevestigator, both the commercial and the free version, contains a tool, called RefGenes, that allows to identify candidate reference genes that display very stable expression in the context that you are working in, typically a certain tissue of a certain organism.
Genevestigator is a platform that contains curated public microarray data from thousands of experiments/conditions.
RefGenes allows you to select the conditions that are relevant for you, e.g. mouse liver, human fibroblasts, or Arabidopsis thaliana leaves. In a next step, RefGenes identifies the genes with the most stable expression in the selected conditions.
Contents
Starting the RefGenes tool
How to start the RefGenes tool ? |
---|
|
The Genevestigator user interface
The Genevestigator consists of the following components:
- Sample Selection panel: to choose the experimental conditions you're interested in (green)
- Gene Selection panel: to choose the genes you're interested in (blue)
- Center panel shows an overview of all available tools (purple). Once you have selected a tool, the panel will show the results of the analysis that is done by the tool.
- Home button (red) allows to return to the overview of the tools at any time. The text next to the home button indicates the toolset that you have selected.
Click the RefGenes tool at the bottom.
Using the RefGenes tool to find reference genes
STEP 1: Choose samples from a biological context similar to those in your qPCR expriment
How to choose the samples you want to analyze ? |
---|
Note that you can select multiple tissues.
|
When you select samples for use in the RefGenes tool, you have to focus on microarrays from samples that were collected in conditions similar to those in your qPCR experiment.
Don't make a too general selection, e.g. all human samples: you might end up with genes that are stable in most conditions but not in yours.
Don't make a very specific selection either, e.g. human heart samples from patients taking the same medication as yours. If you want to broaden your study later on with samples from other patients, your reference genes might not be valid anymore.
It is recommended to select reference genes in the same organism and the same / a similar tissue type as the one that you used in your experiments.
STEP 2: Select the gene(s) you want to measure in your qPCR experiment
This step is not essential, but it helps you to see whether your target gene(s) is (are) strongly or weakly expressed in the conditions of interest selected in STEP1. This allows you to search for candidate reference genes in a similar range of expression.
How to choose the genes you want to analyze ? |
---|
|
STEP 3: Find candidate reference genes
The reference genes that are suggested by GeneVestigator have the following characteristics:
- They have the most stable expression levels across all selected samples (a small boxplot)
- Their overall expression level is similar to that of the target gene(s) of your qPCR experiment
How to find the candidate reference genes ? |
---|
Click the Run button in the RefGenes tool. RefGenes will show the top 20 most stable genes with similar expression levels:
|
Exercises
Finding candidate reference genes in the free version of Genevestigator
Now we will make a more elaborate exercise on finding candidate reference genes. We will do the analysis in the free version of RefGenes but the analysis in the commercial version is very similar.
Suppose we want to compare the expression stability of the 4 commonly used reference genes for qPCR on mouse liver samples (ACTB, GAPDH, HPRT and TUBB4B) to that of 4 reference genes that are suggested by Genevestigator.
To this end we open the RefGenes tool and select the liver samples of the mouse 430_2 arrays.
Check the expression stability of the 4 commonly used reference genes ? |
---|
When you are using the commercial version, you may enter multiple genes at the same time, in the free version you have to enter them one by one. This means that you have to add the first gene as described above and then add the next gene by clicking the Add button and so on...
Finally you end up with an expandable list of the genes you asked for and you can tick or untick them to control the display of their expression data in the main window. When you tick the 4 commonly used reference genes you can see how stable they are expressed in the 651 mouse liver samples that are stored in Genevestigator:
As you can see, the expression levels of the commonly used reference genes in the selected mouse liver samples is pretty variable which is also confirmed by their relatively high SD values. |
Often there are multiple probe sets for the same gene. When you use the free version you may only choose one probe set per gene so you have to make a choice. How to make that choice ?
Affymetrix probe set IDs have a certain meaning: what comes after the underscore tells you something about the quality of the probes:
- _at means that all the probes of the probe set hit one known transcript. This is what you want: probes specifically targeting one transcript of one gene
- _a_at means that all the probes in the probe set hit alternate transcripts from the same gene. This is still ok the probes bind to multiple transcripts but at least the transcripts come from the same gene (splice variants)
- _x_at means that some of the probes hit transcripts from different genes. This is still not what you want: the expression level is based on a combination of signals of all the probes in a probe set so also probes that cross-hybridize
- _s_at means that all the probes in the probe set hit transcripts from different genes. This is definitely not what you want: if the probes bind to multiple genes you have no idea whose expression you have measured on the array
As you can see, each of these 4 commonly used reference genes has a high expression level. Most genes do not have such high expression levels. In most qPCR experiments your genes of interest will have low or medium expression levels, so these reference genes will not be representative for the genes of interest.
Reference genes should ideally have similar expression levels as the genes of interest. Therefore, we will select the four most stably expressed genes with a medium expression level (between 8 and 12) according to the RefGenes tool.
Select the 4 most stably expressed candidate reference gene with medium expression levels. |
---|
Select the 4 candidates with the lowest SD:
|
Then, we performed qPCR on a representative set of 16 of our liver samples to measure the expression of these 8 candidate reference genes and analyzed the data (See how to select the best reference genes using geNorm in qbase+).
Finding candidate reference genes in the commercial version of Genevestigator
We will do the same exercise as above in the commercial version of Genevestigator. The difference between the free and commercial version of RefGenes is the number of target genes you can select. In the free version you have to select one gene and then gradually add all other genes one at a time. The commercial version allows you to load as many target genes as you want simultaneously. As a consequence, you can select multiple probe sets for the same gene.
All VIB scientists have free access to the commercial version of Genevestigator via their VIB email address. If you don't know your VIB email address, check the Who's Who of VIB.
- Open a browser and go to the Genevestigator website
- If it's your first time to access Genevestigator, create an account by clicking join now button. You will be redirected to a new window in which you will give some personal information including a valid VIB email address. Click Register and check your email to activate your new account. Go back to the GeneVestigator website
- Choose the research field you want to investigate: pharma/biomediacal or plant biology by clicking the corresponding button
- Click Start
- Use your VIB email address and password to login to Genevestigator.
- This will automatically open a Genevestigator startup page in your browser. Keep this page open during the analysis. Closing this page will close Genevestigator.
- Genevestigator is opened automatically
Open the RefGenes tool by clicking its icon in the Further tools secion and select the liver samples of the mouse 430_2 arrays as explained in the previous exercise.
Check the expression stability of the 4 commonly used reference genes ? |
---|
I still remove probe sets with an _s or _x since they do not specifically bind to one single gene:
Finally you end up with an expandable list of the genes you asked for and you can tick or untick them to control the display of their expression data in the main window. By default all probe sets are ticked so you can see how stable the commonly used reference genes are expressed in the 651 mouse liver samples that are stored in Genevestigator:
As you can see, the expression levels of the commonly used reference genes in the selected mouse liver samples is pretty variable which is also confirmed by their relatively high SD values. |
The next step of selecting the 4 most stable candidate reference genes with medium expression levels is exactly the same as described above for the free version of RefGenes.
Create a new gene selection with 20 found candidate reference genes and call it mouse_references. |
---|
Click the New button at the top of the main window to create a new selection.
To change the name of the selection right click the name in the Gene selection panel and select Rename
|
Identify perturbations where the mouse_references genes show more than 1,5 fold differential expression using the Condition perturbations tool. |
---|
Click the Home button at the top to go back to the tools overview page.
Click the Perturbations tool in the Condition Search tools section
Make a New Sample selection including all mouse 430_2 arrays. Untick all genes except for the first one and filter the long heatmap for at least 1.5 fold change differential expression:
You now get a list of mouse samples in which the gene is not stably expressed so you can check if any of these samples is related to the samples in your study. Hover your mouse over the name of a sample to see more details about the sample. You can do this for each of the candidate reference genes and select the ones that best fit your needs |
Exercise on selecting reference genes for metacaspases in Arabidopsis thaliana.