Checking the influence of the stability of the reference genes on the analysis results

From BITS wiki
Jump to: navigation, search

[ Main_Page | Exercises on using qbase+ ]

In this example we will analyze data from an artificial expression study containing the following samples:

  • 6 treated samples: treated1, treated2, ... treated6
  • 6 control samples: control1, control2, ... control6

In this study, the expression of the following genes was measured:

  • 4 commonly used reference genes: ACTB, HPRT, GAPDH, and TUBB4. We have seen in the previous exercise that the expression of these reference genes in mouse liver samples is not as stable as generally thought.
  • 3 genes of interest:
    • Low: a gene with low expression levels
    • Medium: a gene with moderate expression levels
    • HighVar: a gene with low and very noisy expression

In general, the lower the expression level, the more noisy the qPCR results will become.

For each of the genes of interest we have included a run in which a 2-fold difference in expression between control and treated samples was created (Low1, Medium1 and HighVar1) and a run with a 4-fold difference in expression (Low2, Medium2 and HighVar2).

There are three technical replicates per reaction.

In a second experiment we used the reference genes that were obtained via Genevestigator and that proved to be more stably expressed in mouse liver samples than the commonly used references.

The data can be found in the NormGenes folder on the BITS laptops or can be downloaded: from our website.

Creating a new experiment

Loading the data

We are going to compare expression in treated versus untreated samples so we need to tell qbase+ which samples are treated and which not. To this end, we have constructed a sample properties file in Excel containing the grouping annotation as a custom property called Treatment.

So as you can see we have 6 treated and 6 untreated samples and we have measured the expression of the 4 commonly used reference genes and 6 genes of interest:


Analyzing the data

Look at the target bar charts.

Now do exactly the same for the second experiment with the same genes of interest but with other reference genes. This means that you have to return to the Analysis wizard. To this end, click the Launch wizard button a the top of the page:


So as you can see we have 6 treated and 6 untreated samples and we have measured the expression of the 4 new reference genes and 6 genes of interest:


As you can see the M and CV values of these reference genes is much lower than these of the 4 commonly used reference genes pointing to the fact that genes are more stably expressed.


It's not that the commonly used reference genes are bad references. Then qbase+ would not display them in green. It's just that the other reference genes are more stable. But this can have a big impact on the results of your analysis...

Plot the average expression level of each group.
Now we will compare the target bar charts of the second and the first experiment to assess the influence of the stability of the reference targets on the analysis results.

Now you can compare the expression of each gene in the first and in the second experiment.


When we do this for HighVar1 for instance, you see that the average expression levels of both groups are the same in the first and the second experiment (check the scales of the Y—axis!). Both experiments detect the two-fold difference in expression level between the groups. However, the error bars are much larger in the first experiment than in the second. The variability of the reference genes does have a strong influence on the errors and the size of the error bars will influence the outcome of the statistical test to determine if a gene is differentially expressed or not. The larger the error bars the smaller the less likely it is that the test will say that the groups differ.

Remember that the error bars represent 95% confidence intervals:

  • if the error bars of the two groups do not overlap: you are certain that the difference between the means of the two groups is significant
  • if they do not overlap: you know nothing with certainty: the means can be different or they can be the same. Of course the more they overlap the smaller the chance that there is a significant difference between the groups.

Check out the results of HighVar2. Here, you clearly see the influence of the reference genes. Again, the fourfold difference in expression is detected by both experiments but:

  • the least stable reference genes (experiment 1) give large overlapping error bars
  • the most stable reference (experiment 2) give smaller, barely overlapping error bars


This means that in experiment 2, a statistical test will probably declare that HighVar2 is differentially expressed while in experiment 1 this will not be the case. We will test this assumption by performing a statistical test.

Statistical analysis of differential expression

As you can see, none of the genes is considered DE by the very conservative non-parametric test. Additionally most genes have the same p-value. That's normal when you don't have many replicates. In our case, we have 6 replicates. Non-parametric tests are based on a ranking of the data values and there are not so many ways to rank 6 data points. This is why you see the same p-values for many genes.

As said before, the non-parametric test is very stringent. If the data do come from a normal distribution, the test will generate false positives. Some of the genes might have have been labeled not DE while in fact they are DE so you might have missed some differential expression. The choice of statistical test with 6 biological replicates depends on what you prefer: false negatives or false positives. Most people will choose false negatives since they don't want to invest time and money in research on a genes that was labeled DE while in fact it is not DE.

Suppose I don't mind false positives but I don't want to miss any potential DE genes. In that case, it's better to go for a t-test. Let's repeat the test n ow choosing a parametric t-test.

Still none of the genes is considered DE but you do see that the p-values of the t-test are lower than these of the Mann-Whitney test.

Now you see that 4 out of the 6 genes are considered DE. This is also what we expected since 3 of our genes of interst have a 4-fold difference in expression level between the two groups. It's understandable that it's hard to detect 2-fold differences in expression especially when the expression of the gene is somewhat variable as is the case for Low1 and HighVar1 but a 4-fold difference is a difference that you would like to detect.

Again the t-test generates lower p-values than the Mann-Whitney test but realize that choosing the t-test when the data is not normally distributed will generate false positives !