SERE
SERE[1] is a statistic, calculated from the count table, that can discern between technical replicate (SERE value = 1), biological replicates (SERE value > 1), different conditions (SERE value >> 1) and identical samples (SERE value = 0).
From the publication:
'Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries.'
Additional file 6. Is the R code to calculate SERE. Format: R Size: 1KB Download file
It is wise to run this code in R on your samples to do a quality check of your count table. Note: this is only useful if your samples are run in separate lanes, and not when you used multiplexing.
SERE_fun <- function(observed, laneTotals, TH=1) { #calculate lambda and expected values total <- sum(laneTotals) fullObserved <- as.matrix(observed[rowSums(observed)>TH,]); fullLambda <- rowSums(fullObserved)/total; fullLhat <- fullLambda > 0; fullExpected<- outer(fullLambda, laneTotals); #keep values fullKeep <- which(rowSums(fullExpected) > 0); #calculate degrees of freedom (nrow*(ncol -1) >> number of parameters - number calculated parameter (lamda is calculated >> thus minus 1) #calculate pearson and deviance for all values oeFull <- rowSums((fullObserved[fullKeep,] - fullExpected[fullKeep,])^2/ fullExpected[fullKeep,]) # test for over dispersion dfFull <- ncol(observed)*length(fullKeep) - sum(fullLhat!=0); devFull <- rowSums( fullObserved[fullKeep,] * log(ifelse(fullObserved[fullKeep,]==0,1,fullObserved[fullKeep,])/fullExpected[fullKeep,])) return(sqrt(sum(oeFull)/dfFull)); } # # Confidence intervals for the sere; multiply the sere value by these # sereci <- function(sere, n, m, p=.95) { q <- (1-p)/2 df <- n*(m-1) x <- sere^2 *df excess <- x- df if (excess > 0) { tfun <- function(non, x, q) pchisq(x, df, ncp=non) -q lfit <- uniroot(tfun, c(0, excess), q=1-q, x=x) ufit <- uniroot(tfun, c(1,2)*excess, q=q, x=x) sqrt(1 + c(lfit$root, ufit$root) / df) } else c(0,0) } #-----------------------------------------------------------------------------
References:
- ↑
Stefan K Schulze, Rahul Kanwar, Meike Gölzenleuchter, Terry M Therneau, Andreas S Beutler
SERE: single-parameter quality control and sample comparison for RNA-Seq.
BMC Genomics: 2012, 13;524
[PubMed:23033915] ##WORLDCAT## [DOI] (I e)