Analyzing differences in copy number of DNA regions using qbase+
[ Main_Page | Exercises on using qbase+ ]
In copy number analysis you test for DNA copy number variation in patient's samples, e.g. tumor samples. Until recently, it was thought that genes were almost always present in two copies in a genome. However, recent studies have revealed that large segments of DNA, ranging in size from thousands to millions of DNA bases, can vary in copy-number. These studies revealed that copy number variations comprise at least three times the total nucleotide content of SNPs. Such copy number variations can lead to dosage imbalances of gene products, which in its turn can lead to diseases or differences in drug response.
So in contrast to gene expression analysis where you start from RNA samples, for copy number analysis you start from genomic DNA samples.
The experiment consists of a single run: Run11
The following samples were used:
- 2 samples of interest
- 1 positive control: a calibrator sample containing 1 allele of the targets called SNJB-6
- 1 positive control: a calibrator sample containing 2 alleles of the targets called gDNA
- 1 no template control: NTC to detect the presence of contaminating DNA
The copy number of the following DNA regions was measured:
- 2 reference genes: ZNF80 and GPR15
- 3 regions of interest: 3 exons of VHL
Creating a new experiment
Create a new Experiment called CopyNumber in Project1 |
---|
You can find the details on how to create a new experiment in Creating a project and an experiment |
Loading the data
Import Run11. The file is in qBase format. |
---|
You can find the details on how to import the data file in the Loading the data into qbase+ section of Analyzing data from a geNorm pilot experiment in qbase+ |
Analyzing the data
Choose the type of analysis you want to perform. |
---|
|
Check controls and replicates. |
---|
First set the minimum requirements for controls and replicates
You see that 4 replicates do not meet these requirements (red). Select to Show details and manually exclude bad replicates
|
Which amplification efficiencies strategy are you going to use ? |
---|
You don't have data of serial dilutions of representative template to build standard curves so the only choice you have is to use the default amplification efficiency (E = 2) for all the genes.
|
For copy number analysis you can also use the multiple reference targets normalization strategy. Just like in gene expression analysis, there is technical variability between the different samples e.g.
- differences in the amount of gDNA used as a template
- pipetting variation
- ...
Appoint the reference genes as reference targets. |
---|
The stability measures for the two reference genes are shown in green meaning that they are indeed stable. |
Now close the analysis wizard by clicking the Close wizard button (red) in the top menu.
Which scaling strategy are you going to use ? |
---|
Since you are doing a copy number analysis you have to choose Scale to positive control. This scaling option is specific for copy number analysis. Before you can scale to a positive control, you have to appoint positive controls:
Do the same for gDNA, for this positive control sample the quantity is 2. Now you can choose to scale to positive control:
|
For copy number analysis you need at least two positive controls with different copy numbers as is the case in our data. These positive control samples are used as a reference points to determine the true copy number in the samples of interest. The NRQs of the positive controls are set to 1 and 2 respectively and the NRQs of the samples of interest are scaled accordingly.
Thresholds are defined to determine how much the scaled NRQs of the samples of interest can deviate for these of the positive controls.
Look at the default settings of these thresholds. |
---|
You can view these settings in the Quality control settings:
You can find the thresholds in the Copy number analysis section (red). So every rescaled NRQ that falls in this range is considered normal, i.e. the region is present in two copies. |
These thresholds are used to determine if regions are duplicated, deleted or occur in the normal number of copies (2) in the samples of interst. Qbase+ shows these calls in a bar chart in which the thresholds are used for coloring the bars (red for duplications, blue for deletions and grey for normal copy number). The reason why the default settings for these thresholds are:
- 1.414: it's the geometric mean of 1 and 2 copies
- 2.449: it's the geometric mean of 2 and 3 copies
These default settings are recommended for diploid organisms (human, mouse, rat...).
Analyze the copy number of the three exons of VHL. |
---|
You find the colour codes in the legend of the figure (green). As you can see sample1 contains only one copy of the second exon. Sample2 contains only one copy of exon1 and exon2. |