NGS RNASeq DE Exercise.7
Use the RobiNA GUI to find DE genes using EdgeR and DESeq1 packages
[ Main_Page | Hands-on_introduction_to_NGS_RNASeq_DE_analysis | NGS_RNASeq_DE_Exercise.6 | NGS_RNASeq_DE_Exercise.8 ]
Contents
Robina run through example
RobiNA is a standalone GUI with embedded R code that allows differential analysis of microarray or NGS data. It was developed by a plant group but also applies to non-plant data although without the part adding gene symbols to the DE results. The final annotation can easily be taken care of using the attached R-code.
We use for this example a slightly modified version of the count table obtained from the bioconductor airway package. The table only includes the untreated and dex-treated cell line samples (N=3) and does not look at the effect of the second drug or at the drug combination. This table was generated by state of the art tophat mapping (tuxedo workflow) and should be very similar, if not identical, to the corresponding table produced with this session code. In order for RobiNA to identify the two groups 'correctly', the original header line was modified to comply with the software. The first column was named ID, the untreated cell line samples were renamed untr_1 to _3, and the Dex samples where renamed dex_1 to _3.
The paired status of the samples is not taken into account by RobiNA which treats them as technical replicates. This is not correct but is all what you get using this simplified interface
ENSG00000000003 679 448 873 408 1138 1047 770 572
ENSG00000000005 0 0 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799 417 508
ENSG00000000457 260 211 263 164 245 331 233 229
ENSG00000000460 60 55 40 35 78 63 76 60
ENSG00000000938 0 0 2 0 1 0 0 0
ENSG00000000971 3251 3679 6177 4252 6721 11027 5176 7995
ENSG00000001036 1433 1062 1733 881 1424 1439 1359 1109
ENSG00000001084 519 380 595 493 820 714 696 704
RobiNA common setup steps
RobiNA DESeq1 steps
RobiNA EdgeR steps
Add gene symbol information to the tables and compare results
The following code was used for other similar tables and can be recycled with minor changes to add a column of gene symbols to the RobiNA tables; It makes use of R packages biomaRt that query the Biomart database and can add many more annotation types than what is shown here. Please read the biomaRt package manual for more information.
RobiNA-add_symbol.R script
#!/usr/bin/RScript # RobiNA-add_symbol.R # load specific library library("org.Hs.eg.db") # preview available types columns(org.Hs.eg.db) # [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" # [11] "MAP" "PATH" "PMID" "REFSEQ" "SYMBOL" "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" # [21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" "GOALL" "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG" # define function to add names to the existing table convertIDs <- function( ids, fromKey, toKey, db, ifMultiple=c( "putNA", "useFirst" ) ) { stopifnot( inherits( db, "AnnotationDb" ) ) ifMultiple <- match.arg( ifMultiple ) suppressWarnings( selRes <- AnnotationDbi::select( db, keys=ids, keytype=fromKey, cols=c(fromKey,toKey) ) ) if( ifMultiple == "putNA" ) { duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ] selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ] } return( selRes[ match( ids, selRes[,1] ), 2 ] ) } # read RobiNA data in. edit to match your own path basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015/RobiNA/" setwd(basedir) #### EdgeR results # EdgeR: Identifier logFC logCPM PValue FDR datafile1 <- "RobiNA_EdgeR/detailed_results/full_table_dex-untr.txt" # let res1 be a result table returned by RobiNA and containing a first column named ID with ENSG-IDs res1 <- read.table(datafile1, header=T, row.names=1) # fetch matching gene symbols res1$symbol <- convertIDs( row.names(res1), "ENSEMBL", "SYMBOL", org.Hs.eg.db ) # reorder columns and bring symbols in column#1 # reorder rows by increasing p.value res1 <- res1[,c(5,1:4)] res1 <- res1[with(res1, order(PValue)), ] # preview final table head(res1) # save table to disk filename1 <- "EdgeR_results-all.csv" write.csv( as.data.frame(res1), file=filename1 ) # count genes for FDR<0.05 and abs(logFC>=1) table(res1$FDR<0.05 , abs(res1$logFC)>=1) # FALSE TRUE # FALSE 53679 9065 # TRUE 210 1148 ### DESeq1 results # Deseq1: id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj datafile2 <- "RobiNA_Deseq1/detailed_results/full_table_dex-untr.txt" # let res1 be a result table returned by RobiNA and containing a first column named ID with ENSG-IDs res2 <- read.table(datafile2, header=T, row.names=1) # fetch matching gene symbols res2$symbol <- convertIDs( row.names(res2), "ENSEMBL", "SYMBOL", org.Hs.eg.db ) # reorder columns and bring symbols in column#1 # reorder rows by increasing p.value res2 <- res2[,c(8,1:7)] res2 <- res2[with(res2, order(pval)), ] # preview final table head(res2) # save table to disk filename2 <- "DESeq1_results-all.csv" write.csv( as.data.frame(res2), file=filename2 ) table(res2$padj<0.05 , abs(res2$log2FoldChange)>=1) # FALSE TRUE # FALSE 22442 10220 # TRUE 252 555
symbol logFC logCPM PValue FDR
ENSG00000109906 ZBTB16 7.129259 4.156839 7.525572e-77 4.824042e-72
ENSG00000171819 ANGPTL7 5.570803 3.551126 7.983188e-59 2.558692e-54
ENSG00000127954 STEAP4 5.021176 3.716265 3.068593e-54 6.556765e-50
ENSG00000179593 ALOX15B 11.057712 1.621229 2.118021e-52 3.394234e-48
**ENSG00000152583 SPARCL1 4.540837 5.533920 3.239141e-50 4.152709e-46
ENSG00000163884 KLF15 4.453668 4.700292 2.538302e-48 2.711838e-44
ENSG00000168309 FAM107A 4.356499 2.858106 3.240754e-43 2.967697e-39
ENSG00000250978 <NA> 6.160280 1.353697 1.323731e-41 1.060672e-37
**ENSG00000096060 FKBP5 3.822147 6.897970 6.342675e-41 4.517535e-37
ENSG00000101347 SAMHD1 3.748595 9.207328 8.763185e-40 5.617377e-36
ENSG00000211445 GPX3 3.569312 9.159303 6.578585e-38 3.833641e-34
ENSG00000100033 PRODH 4.742678 1.313615 1.919896e-34 1.025577e-30
**ENSG00000162692 VCAM1 -3.564117 4.580852 2.478399e-34 1.222080e-30
> head(res2, 13) # DESeq1
symbol baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
**ENSG00000096060 FKBP5 2564.36223 4815.07582 313.648638 0.06513888 -3.940337 1.218100e-45 4.076859e-41
ENSG00000189221 MAOA 2341.76725 4253.70269 429.831818 0.10104886 -3.306875 2.078128e-42 3.477643e-38
ENSG00000179094 <NA> 776.59667 1399.33687 153.856466 0.10994956 -3.185086 5.894244e-38 6.575815e-34
ENSG00000170214 ADRA1B 201.57641 374.10413 29.048691 0.07764868 -3.686895 4.056765e-36 3.394397e-32
**ENSG00000152583 SPARCL1 997.43977 1916.07633 78.803214 0.04112739 -4.603757 3.806190e-34 2.547788e-30
**ENSG00000162692 VCAM1 508.17023 76.57865 939.761815 12.27185141 3.617281 1.100969e-28 6.141387e-25
ENSG00000178695 KCTD12 2649.85015 794.97080 4504.729494 5.66653451 2.502467 3.476343e-25 1.662139e-21
ENSG00000250978 <NA> 56.31819 111.15372 1.482666 0.01333888 -6.228219 4.646981e-25 1.944123e-21
ENSG00000120129 DUSP1 3409.02938 6036.62368 781.435072 0.12944903 -2.949544 3.915487e-24 1.456083e-20
ENSG00000125148 MT2A 3656.25278 6007.70878 1304.796789 0.21718709 -2.202990 2.542506e-22 8.509513e-19
ENSG00000124766 SOX4 1281.46866 417.41921 2145.518101 5.13996008 2.361757 1.265499e-21 3.850452e-18
ENSG00000166741 <NA> 7432.93635 12151.58166 2714.291047 0.22336936 -2.162497 4.500001e-21 1.255088e-17
ENSG00000198624 CCDC69 2057.19173 3626.73778 487.645681 0.13445849 -2.894767 1.094766e-20 2.818516e-17
plot recall Venn diagrams using VennDiagram or limma
### plotting and VENN diagram # volcano plots png("EdgeR-volcano.png") plot(1-res1$FDR ~ res1$logFC, xlim=c(-10,10), xlab="log FC", ylab="1-FDR", main="EdgeR data", pch=20, cex=0.5 ) abline(h=0.95, lwd=2, col="blue") dev.off() png("DESeq1-volcano.png") plot(1-res2$padj ~ res2$log2FoldChange, xlim=c(-10,10), xlab="log FC", ylab="1-FDR", main="DESeq1 data", pch=20, cex=0.5 ) abline(h=0.95, lwd=2, col="blue") dev.off() # scatterplot of logFC in two methods big <- merge(res1, res2, by=0) png("log-FC_concordance.png") plot(big$log2FoldChange ~ big$logFC, xlim=c(-6,6), ylim=c(-6,6), xlab="EdgeR fold change (log)", ylab="DESeq1 fold change (log)", main="fold-change concordance", pch=21, cex=0.5) abline(a=0,b=-1,col="green",lwd=2) dev.off() png("FDR_concordance.png") plot(big$padj ~ big$FDR, xlim=c(0,0.05), ylim=c(0,0.05), xlab="EdgeR FDR", ylab="DESeq1 padj", main="fold-change concordance", pch=21, cex=0.5) abline(a=0,b=1,col="green",lwd=2) dev.off() # subset ENS IDs for DE genes de1.ENS <- row.names( subset(res1, (abs(res1$logFC)>=1) & (res1$FDR<=0.05) ) ) de2.ENS <- row.names( subset(res2, (abs(res2$log2FoldChange)>=1) & (res2$padj<=0.05) ) ) #### plot Venn diagram to show concordance # require("VennDiagram") venn.plot <- venn.diagram( list(edgeR.DE = de1.ENS, DESeq1.DE = de2.ENS), filename = NULL ) png("Venn_DE-recall.png") grid.draw(venn.plot) dev.off() # alt using limma library("limma") Lists <- list(de1.ENS, de2.ENS) #put the word vectors into a list to supply lapply items <- sort(unique(unlist(Lists))) #put in alphabetical order vennData <- matrix(rep(0, length(items)*length(Lists)), ncol=2) #make a matrix of 0's colnames(vennData) <- paste0("List", 1:2) rownames(vennData) <- items lapply(seq_along(Lists), function(i) { #fill the matrix vennData[items %in% Lists[[i]], i] <<- table(Lists[[i]]) }) head(vennData) #look at the results a <- vennCounts(vennData) colnames(a) <- c('edgeR.DE', 'DESeq1.DE', 'Counts') vennDiagram(a, main='Comparison of DE genes from two callers') title(main='RobiNA results |logFC|>1 & FDR<0.05', cex.main=0.9) png("Venn_DE-recall2.png") dev.off()
The script ends by comparing genes for which at least 2-fold differential expression with a FDR of less than 0.05 was observed from the three cell lines. As seen below, EdgeR reports many extra DE genes but also most of the DE genes found by DESeq1. This is a known fact for these two packages
(concordant genes in red) | (partial range [0; 0.1]) |
Warning: DESeq1 takes the contrast in the opposite direction as compared to EdgeR 'Control vs treated' io 'treated vs Control'
Also note that the returned statistics are quire different between methods although both methods overlap well for the genes found DE (concordant genes in red)
running DESeq1 in reverse order
symbol baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
ENSG00000096060 FKBP5 2564.36223 313.648638 4815.07582 15.3518148 3.940337 1.218100e-45 4.076859e-41
ENSG00000189221 MAOA 2341.76725 429.831818 4253.70269 9.8962024 3.306875 2.078128e-42 3.477643e-38
ENSG00000179094 <NA> 776.59667 153.856466 1399.33687 9.0950800 3.185086 5.894244e-38 6.575815e-34
ENSG00000170214 ADRA1B 201.57641 29.048691 374.10413 12.8785193 3.686895 4.056765e-36 3.394397e-32
ENSG00000152583 SPARCL1 997.43977 78.803214 1916.07633 24.3146979 4.603757 3.806190e-34 2.547788e-30
ENSG00000162692 VCAM1 508.17023 939.761815 76.57865 0.0814873 -3.617281 1.100969e-28 6.141387e-25
ENSG00000178695 KCTD12 2649.85015 4504.729494 794.97080 0.1764747 -2.502467 3.476343e-25 1.662139e-21
ENSG00000250978 <NA> 56.31819 1.482666 111.15372 74.9688177 6.228219 4.646981e-25 1.944123e-21
ENSG00000120129 DUSP1 3409.02938 781.435072 6036.62368 7.7250483 2.949544 3.915487e-24 1.456083e-20
ENSG00000125148 MT2A 3656.25278 1304.796789 6007.70878 4.6043252 2.202990 2.542506e-22 8.509513e-19
ENSG00000124766 SOX4 1281.46866 2145.518101 417.41921 0.1945540 -2.361757 1.265499e-21 3.850452e-18
ENSG00000166741 <NA> 7432.93635 2714.291047 12151.58166 4.4768897 2.162497 4.500001e-21 1.255088e-17
ENSG00000198624 CCDC69 2057.19173 487.645681 3626.73778 7.4372396 2.894767 1.094766e-20 2.818516e-17
Discussion
Although not always easy to install, RobiNA presents the unique advantage to be minimalist and to bring the user very quickly to results. RobiNA is able to analyze microarray data as well as NGS data and accepts precomputed gene counts for those who do not have enough computing power to perform the mapping steps on their machine.
Obviously, more advance statistical handling will be more accurate in many cases and RobiNA only allows pairwise DE analysis but still, it is a remarkable piece of software for the beginner.
download exercise files
Download exercise files here.
References:
[ Main_Page | Hands-on_introduction_to_NGS_RNASeq_DE_analysis | NGS_RNASeq_DE_Exercise.6 | NGS_RNASeq_DE_Exercise.8 ]