NGS RNASeq DE Exercise.8
Wrap-Up comparison of RNASeq results obtained in the different exercises
Compare significant DE-calls from three commonly used packages
One obvious comparison can be made by taking the significant hits from one caller (corr-pVal<0.1) and plot their logFC value against that obtained with a second caller. If the callers are identical, the dots should fall on the diagonal. We show here pairwise comparisons between calls from DESeq2 (the most common and stringent caller use today) and results obtained in NGS_RNASeq_DE_Exercise.7 using the easy RobiNA software (DESeq1 and EdgeR).
- Black dots are DE-results filtered at padj|FDR<0.1 with the most stringent caller
- Red dots are the top 100 genes after sorting by increasing padj|FDR
Compare results obtained with DESeq2 and EdgeR workflows
Compare results obtained with DESeq2 from 'all' and 'gtf' counts
We wish here to control wether some genes would appear differentially expressed when comparing HTSeq counts obtained from untreated samples and the two Tophat workflows. The hypothesis is that mapping against the full genome or against the exomic part only (GTF file) may introduce a significant bias in the gene counts.
• Use ‘all’ and ‘gtf’ tophat2 + htseq-count results for all ‘untreated’ samples (2x4) • Melt the count tables into one dataframe • Compute group log2-means and T-test between the groups and correct for multiple testing • Identify genes-counts with a T-test result lower than (0.1 & 0.05) and plot their log2-mean in each group.
Investigating the read coverage across these genes would probably reveal mapping biases influenced by the non-exomic part of the human reference in the case of the 'all' mapping. The count of 'not.unique' mappings is higher in the 'all' HTSeq data while more 'ambiguous' mappings are reported in the 'gtf' HTSeq data.
The scatter-plot of gene count means reveals biases between the two workflows and a random distribution of pseudo-significant counts in the whole expression range.
- green dots represent mean values of genes-counts not significantly different between the tophat 'all' and 'gtf' sample groups
- red dots had T-test results below 0.1
- black dots obtained a T-test value lower than 0.05
We see in this final graph that a number of genes appear differentially expressed when comparing HTSeq counts obtained from the two tophat2 workflows applied to the same original reads.
- Most significant calls from one caller remain significant with a second caller, which is quire reassuring and fortunate.
- DESeq2 leads to different results as compared to DESeq1 or EdgeR, especially for extreme calls
- When EdgeR is used as significance filter, a number of black dots appear that are not confirmed by DESeq (EdgeR is known to be less stringent and call false-positive DE genes).
- The choice of the Tophat reference (full genome or exome) has some effect on the counts in a given number of genes and may bias the differential expression analysis.
download exercise files