Analyzing gene expression data in qbase+

From BITS wiki
Jump to: navigation, search

[ Main_Page | Loading data into qbase+ | Exercises on using qbase+ ]

Once runs are imported, you can start analyzing the data. Data consist of Cq values for all the wells.

Specifying the aim of the experiment

On the Aim page you tell the software what type of analysis you want to do. Different types of analyses require different parameters, parameter settings and different calculations. By selecting the proper analysis type, qbase+ will only show the relevant parameters and parameter settings.

Since we are doing a gene expression analysis in this exercise, this the option we should select.
Click the Next button on the bottom of the page to go to the Technical quality control page.

Checking the quality of technical replicates and controls

The Technical quality control page handles the settings of the requirements that the data have to meet to be considered high quality. For instance the maximum difference between technical replicates is defined on this page. If there are technical replicates in the data set, qbase+ will detect them automatically (they have the same sample and target name) and calculate the average Cq value. In theory, technical replicates should generate more or less identical signals.

Additionally, you can do quality checks based on the data of the positive and negative controls.

Mortasecca.png Warning:
Wells that do not meet one of these criteria are flagged but not automatically excluded.
Wells that do not have a signal (typically negative controls) are automatically excluded.

Excluded means that the data are ignored in the calculations.

If you are finished checking the data quality, click Next to go to the Amplification efficiencies page.

Taking into account amplification efficiencies

Qbase+ calculates an amplification efficiency (E) for each primer pair (= gene). Genes have different amplification efficiencies because:

  • some primer pairs anneal better than others
  • the presence of inhibitors in the reaction mix (salts, detergents…) decreases the amplification efficiency
  • inaccurate pipetting

Qbase+ has a parameter that allows you to specify how you want to handle amplification efficiencies on the Amplification efficiencies page.

Amplification efficiencies are calculated based on the Cq values of a serial dilution of representative template, preferably a mixture of cDNAs from all your samples. Since you know the quantity of the template in each dilution, you can plot Cq values against template quantities for each primer pair. Linear regression will fit a standard curve to the data of each gene, and the slope of this curve is used to calculate the amplification efficiency.

In this way, one amplification efficiency (E) for each gene is calculated and used to calculate Relative Quantities (RQ):


∆Cq is calculated for each well by subtracting the Cq of that well from the average Cq across all samples for the gene that is measured in the well. So ∆Cq is the difference between the Cq value of a gene in a given sample and the average Cq value of that gene across all samples.
Cq is subtracted from the average because in this way high expression will result in a positive ∆Cq and low expression in a negative ∆Cq.

So at this point the data set contains one RQ value for each gene in each sample.

Click Next to go to the Normalization page.


Differences in amplification efficiency are not the only source of variability in a qPCR experiment.
Several factors are responsible for noise in qPCR experiments e.g. differences in:

  • amount of template cDNA between wells
  • RNA integrity of samples
  • efficiency of enzymes used in the PCR or in the reverse transcription

Handicon.png Noise: variability between samples that has no biological relevance

Normalization will eliminate this noise as much as possible. In this way it is possible to make a distinction between genes that are really upregulated and genes with high expression levels in one group of samples simply because higher cDNA concentrations were used in these samples.

In qPCR analysis, normalization is done based on housekeeping genes.

Housekeeping genes: genes with constant expression levels in all cell types, tissues and conditions that are studied in the experiment

Housekeeping genes are measured in all samples along with the genes of interest. In theory, a housekeeping gene should have identical RQ values in all samples. In reality, noise generates variation in the expression levels of the housekeeping genes. This variation is a direct measure of the noise and is used to calculate a normalization factor for each sample.

Normalization Factor (NF): factor that is multiplied to the RQ values so that the measured expression levels of the housekeeping genes are equalized across all samples. There is one NF for each sample.

These normalization factors are used to adjust the RQ values of the genes of interest accordingly so that the variability is eliminated.

These adjusted RQ values are called Normalized Relative Quantities (NRQs).

In qbase+ housekeeping genes are called reference genes. In our data set there are three reference genes: Stable, Non-regulated and Flexible. On the Normalization page we can define the normalization strategy we are going to use, appoint the reference genes and check their stability of expression.

It's not because you have appointed genes as reference genes that they necessarily are good reference genes. They should have stable expression values over all samples in your study. Fortunately, qbase+ checks the quality of the reference genes.

For each appointed reference gene, qbase+ calculates two indicators of expression stability

  • M (geNorm expression stability value): calculated based on the pairwise variations of the reference genes.
  • CV (coefficient of variation): the ratio of the standard deviation of the NRQs of a reference gene over all samples to the mean NRQ of that reference gene.
It is considered that the higher these indicators the less stable the reference gene.

It should be noted that for some experiments (heterogeneous samples, samples from fly or plant) the limits for CV and M-values may be increased to 0.5 and 1 respectively.

If the quality of the reference genes is not good enough, it is advised to remove the reference gene with the worst M and CV values and re-evaluate the remaining reference genes.

You can remove a reference gene simply by unticking the box in front of its name.

This exercise shows the importance of using a minimum of three reference genes. If one of the reference genes does not produce stable expression values as is the case for Flexible, you always have two remaining reference genes to do the normalization.

See how to select reference genes for your qPCR experiment.

So after normalization you have one NRQ value for each gene in each sample.

Click Next to go to the Scaling page.


Rescaling means that you calculate NRQ values relative to a specified reference level.

Mortasecca.png Warning: Scaling only changes the scale, so the expression levels will be different but not the fold changes between the samples

Qbase+ allows you to rescale the NRQ values using one of the following as a reference:

  • the sample with the minimal expression
  • the average expression level of a gene across all samples
  • the sample with the maximal expression
  • a specific sample (e.g. untreated control)
  • the average of a certain group (e.g. all control samples): this is often how people want to visualize their results
  • positive control: only to be used for copy number analysis

After scaling, the expression values of the choice you make here will be set to 1 e.g. when you choose average the average expression level across all samples will be set to 1 and the expression levels of the individual samples will be scaled accordingly.

Rescaling to the average of a group is typically used to compare results between 2 groups, e.g. treated samples against untreated controls. After rescaling, the average of the NRQs across all untreated samples is 1 and the NRQs of the treated samples are scaled accordingly.

Click Next to go to the Analysis page.

Visualization of the results

One of the things you can select to do on the Analysis page is viewing the relative expression levels (= scaled NRQs) of each of the genes in a bar chart per gene. It is recommended to visualize your results like this.

It is possible to view the relative expression levels of all genes of interest on the same bar chart. You can use this view to see if these genes show the same expression pattern but you cannot directly compare the heights of the different genes because each gene is independently rescaled!

You can group and colour the bars according to a property.

The nice characteristic of 95% confidence intervals is the following:

  • if they do not overlap you are sure that the expression levels in the two groups are significantly different, in other words the gene is differentially expressed
  • if they do overlap you cannot say that you are sure that the expression levels are the same. You simply don’t know if the gene is differentially expressed or not.

Mortasecca.png Warning: Setting the Y-axis in logarithmic scale does not mean that you log transform the NRQs !

Switching the Y-axis to a logarithmic scale can be helpful if you have large differences in NRQs between different samples

Mortasecca.png Warning: Never directly compare the heights of the bars of different genes because each gene is independently rescaled!

Statistical analysis

Once you generate target bar charts you leave the Analysis wizard and you go to the regular qbase+ interface. Suppose that you want to perform a statistical test to prove that the difference in expression that you see in the target chart is significant.

At some point, qbase+ will ask you if your data is coming from a normal distribution. If you don't know, you can select I don't know and qbase+ will assume the data are not coming from a normal distribution and perform a stringent non-parametric test.

However, when you have 7 or more replicates per group, you can check if the data is normally distributed using a statistical test. If it is, qbase+ will perform a regular t-test. The upside is that the t-test is less stringent than the non-parametric tests and will find more DE genes. However, you may only perform it on normally distributed data. If you perform the t-test on data that is not normally distributed you will generate false positives i.e. qbase+ will say that genes are DE while in fact they are not. Performing a non-parametric test on normally distributed data will generate false negatives i.e. you will miss DE genes.

Checking if the data is normally distributed can be easily done in GraphPad Prism. To this end you have to export the data.

Exporting will generate an Excel file in the location that you specified. However, the file contains the results for all samples and we need to check the two groups (treated and untreated) separately. The sample properties show that the even samples belong to the treated group and the odd samples to the untreated group.


This means we have to generate two files:

Now we can open these files in Prism to check if the data is normally distributed.

Since we found that there's one group of data that does not follow a normal distribution, it's no longer necessary to check if the treated data are normally distributed but you can do it if you want to.

We will now proceed with the statistical analysis in qbase+

Statistical analyses can be performed via the Statistics wizard.

This opens the Statistics wizard that allows you to perform various kinds of statistical analyses.

On the Settings page you have to describe the characteristics of your data set, allowing qbase+ to choose the appropriate test for your data.

The first thing you need to tell qbase+ is whether the data was drawn from a normal or a non-normal distribution. Since we have 8 biological replicates per group we can do a test in Prism to check if the data are normally distributed.

[ Main_Page | Loading data into qbase+ | Exercises on using qbase+ ]