Count reads at gene level using HTSeq

[ Main_Page | Hands-on_introduction_to_NGS_RNASeq_DE_analysis | NGS_RNASeq_DE_Exercise.3 | NGS_RNASeq_DE_Exercise.5 ]

Introduction

In order to compute differential expression between sample groups, we first need to convert mapping results to gene level counts for each sample and merge all resulting count tables into a global table to be fed to R-bioconductor packages for statistical analysis. The counting can also be done in R using various packages but will be slower due to the lesser support for multi-threading and parallelization as compared to command-line.

Compute gene counts from BAM data

The following script perform gene count from each BAM file and saves the results to a simple two-column text file for each sample. We use the popular HTSeq-count executable ^[1] to compute gene counts. Note that other packages in R (Rsamtools, countOverlaps from IRanges, summarizeOverlaps from GenomicRanges) as well as standalone software like Qualimap propose methods for this aim.

HTSeq counts transcriptome hits from BAM data in different 'modes' proposed to the user and detailed below.

We will use in our script the 'intersection-strict' method

More information about this universally used package can be found on the developer web site

bam_htseq-count.sh script

#!/bin/bash
# bam_htseq-count.sh
 
# run HTSeq on single BAM sample
# count only uniquely mapped mappings & alignQ > 10
# requires HTSeq (0.6.0+)
## requires sam data sorted by 'queryname' as in our markdup set
 
# help: batch apply this script to all bams with
## for b in tophat2.0.13_results/markdup_bams/*_all-tophat-q.bam; do \
##   echo "# analyzing ${b}" \
##   scripts/bam_htseq-count.sh $b\
##   done
 
# or even better run 5 jobs in 'parallel' if you have power
# find tophat2.0.13_results/markdup_bams/ -name "*_all_mdup.bam" | \
#	parallel --no-notice --workdir . --tmpdir ./tmp -j 5 \
#	scripts/bam_htseq-count.sh {}
 
# takes a single bam file from $1
if [ $# -lt 1 ]
then
echo "Usage: ${0##*/} <bam file>
	eg: tophat2.0.13-SRR1039509_results/markdup_bams/SRR1039509_all_mdup.bam"
exit
fi
# argument#1 gives the bam file
bam=$1
name=$(basename ${bam} ".bam")
 
# test if bam is queryname-sorted
order="$(samtools view -H ${bam} | head -1 | awk '{split($3, ord, ":"); print ord[2]}')"
echo "# order is $order"
echo
 
if [ "$order" = "coordinate" ]; then
	echo "# sorting in queryname order"
	# redefine qbam
	qbam=${bam//.bam/-q.bam}
	cmd="java -jar $PICARD/picard.jar SortSam \
		I=${bam} \
		O=${bam//.bam/-q.bam} \
		SO=queryname"
	eval ${cmd} && bam=${bam//.bam/-q.bam}
fi
 
# create output folder
outfolder="htseq_counts"
mkdir -p ${outfolder}
 
# select transcript annotation model
gtffile=$BOWTIE2_INDEXES/hg19_ensGene.gtf
# other options would be
#gtffile=$BOWTIE2_INDEXES/chr22-hg19_ensGene.gtf
#gtffile=$BOWTIE2_INDEXES/hg19_refGene.gtf
 
# decompress bam with 'samtools' and use default parameters
# possible IDs are:
## transcript_name (gene symbol)
## transcript_id (ENST-ID)
## exon_id (ENSE-ID)
## gene_id (ENSG-ID:default) ++
 
cmd="samtools view ${bam} | \
  	htseq-count \
  	-m intersection-strict \
  	-s no \
  	-a 10 \
  	-t exon \
  	-i gene_id \
  	- \
  	${gtffile} \
  	> ${outfolder}/${name}_counts.txt \
 	2> >(tee -a ${outfolder}/${name}_errorlog.txt >&2)"
 
echo "# ${cmd}"
eval ${cmd}
echo
 
# compute simple stats on ENSG transcripts with coverage
# use non-null data from col#2 when col#1 contains a valid "ENSG ID"
echo "# simple coverage stats for transcripts with read counts"
echo "# ${bam}"
awk 'BEGIN{FS="\t"; OFS="\t"}{if($1~/ENSG/ && $2>0) print $2}' \
	${outfolder}/${name}_counts.txt | qstats -s

Install R and bioconductor

This section does not need to be done for BITS laptops but is required if you run the code on your own machine; We do not provide all details on steps required to install the require base software and dependencies as they differ depending on your operating system. Please read the documentation behind each link/

install R on your computer, if you rinstalled R version is outdated (eg installed using apt-get on ubuntu), consider removing it and reinstalling the latest stable version from CRAN.
install the latest RStudio version from the RStudio site
launch RStudio and run the following code to configure the basic bioconductor installation and update outdated packages in your installed version.

Install and Update RStudio and R packages

## Change default Bioconductor and CRAN mirrors
## this part will be run interactively (not run here)
#chooseBioCmirror()
#chooseCRANmirror()
 
## identify where R looks for packages
 
# locate library folder
.libPaths()
 
## make sure that you have writing rights in this place 
##          as this will allow you installing packages for all users of your machine
 
# see what is currently installed
library()
 
###### Install bioconductor (takes some time and need storage and internet speed) #####
 
# If you don't have the BiocInstaller package installed, you can quickly install and load it as follows:
source("http://bioconductor.org/biocLite.R")
 
# then update install the bioconductor base packages 
#     as well as update all existing packages with the very easy command
biocLite()
 
## upgrade bioconductor might be relevant if you have a very old version running from ages
biocLite("BiocUpgrade")
 
# next time you want to update packages, just come back and run
source("http://bioconductor.org/biocLite.R")
biocLite()
 
# when you need a new package and know its name, most often this will do
biocLite("<the name of the new package>")
 
## related to this training, you can access packages and their documentation within RStudio
source("http://bioconductor.org/biocLite.R")
 
# a full workflow related to the extended DESeq2 demo
source("http://bioconductor.org/workflows.R")
workflowInstall("rnaseqGene")
library("rnaseqGene")
vignette("rnaseqGene")
 
# THE package
library("DESeq2")
vignette("DESeq2")
browseVignettes("DESeq2")
 
# data used by DESeq2
library("airway")
browseVignettes("airway")
 
# EdgeR
library("edgeR")
vignette("edgeR")
###### happy R'ing

RStudio can generate PDF, HTML, and Word documents from your code that also include figures and more

This is a really cool feature to write reports and share them. You will need for his a couple more dependencies at system level and will find their name when trying and failing to compile your first document (Dr G will help you fixing these annoying first steps)

Compare raw HTSeq counts from tophat 'all' and gtf' mappings

The Raw HTSeq counts obtained for the sample (SRR1039509) were used to evaluate the effect of mapping reads without or with guidance from a transcriptome file. HTSeq conversion of both BAM files returned counts distributions summarized with qstats as follows:

A typical HTSeq count file looks like this

ENSG00000000003 434
ENSG00000000005 0
ENSG00000000419 488
ENSG00000000457 226
ENSG00000000460 52
ENSG00000000938 0
ENSG00000000971 3159
ENSG00000001036 1524
ENSG00000001084 386
ENSG00000001167 257
...

Summary lines reported for each HTSeq run

SRR1039509	all-counts	gtf-counts
no_feature	1871251	1212702
ambiguous	567580	691622
too_low_aQual	0	0
not_aligned	0	0
alignment_not_unique	5535220	2553851

Descriptive statistics for non-null counts obtained in the two tophat runs

# consider lines starting with ENSG
# take column #2 
# ignore count=0
# compute descriptive stats using qstats
awk 'BEGIN{FS="\t"; OFS="\t"}{if($1~/ENSG/ && $2>0) print $2}' \
	${outfolder}/chr22-${name}_counts.txt | qstats -s

SRR1039509	counts for 'all'	counts for 'gtf'
Min.	1	1
1st Qu.	5	4
Median	78	79
Mean	755.246	832.739
3rd Qu.	527	557
Max.	210019	214935
Range	210018	214934
Std Dev.	4181.34	4554.83
Length	21908	21491

The data from both methods was used to plot 'all' vs 'gtf' counts. If both methods were identical, all points should be on the red diagonal. The next figure shows that the results are highly similar but not identical and that the 'all' counts presents slightly more variability. Dotted lines at 10 or 20-counts (3.32 or 4.32 in log2) delimit the lower region of low precision in which results differ the most.

compare-htseq_counts.R script

#!/usr/bin/RScript
# compare-htseq_counts.R
# copy this code to RStudio and adapt file locations to match yours
 
# basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015/SRR1039509_data/tophat2.0.13-SRR1039509_results/htseq_counts"
basedir <- <edit here for your machine>
setwd(basedir)
 
# Take htseq-count for 'all' and 'gtf' and make scatter-plot
# use sample SRR1039509 for plot
SRR1039509_all <- read.delim("SRR1039509_all_counts.txt", header=F)
SRR1039509_gtf <- read.delim("SRR1039509_gtf_counts.txt", header=F)
 
data <- merge(SRR1039509_all, SRR1039509_gtf, by='V1')
colnames(data) <- c("ID", "SRR1039509_all", "SRR1039509_gtf")
head(data, 10)
 
# ID SRR1039509_all SRR1039509_gtf
# 1  __alignment_not_unique        5535220        2553851
# 2             __ambiguous         567580         691622
# 3            __no_feature        1871251        1212702
# 4           __not_aligned              0              0
# 5         __too_low_aQual              0              0
# 6         ENSG00000000003            434            440
# 7         ENSG00000000005              0              0
# 8         ENSG00000000419            488            506
# 9         ENSG00000000457            226            234
# 10        ENSG00000000460             52             56
 
# extract summary lines
data.summary <- data[grep("^ENS", data$ID, perl=TRUE, invert=TRUE), ]
rownames(data.summary) <- data.summary$ID
data.summary <- data.all.summary[,-1]
 
data.summary
 
# ID SRR1039509_all SRR1039509_gtf
# __alignment_not_unique __alignment_not_unique        5535220        2553851
# __ambiguous                       __ambiguous         567580         691622
# __no_feature                     __no_feature        1871251        1212702
# __not_aligned                   __not_aligned              0              0
# __too_low_aQual               __too_low_aQual              0              0
 
# extract gene data lines
data <- data[grep("^ENS", data$ID, perl=TRUE, invert=FALSE), ]
head(data)
 
# ID SRR1039509_all SRR1039509_gtf
# 6  ENSG00000000003            434            440
# 7  ENSG00000000005              0              0
# 8  ENSG00000000419            488            506
# 9  ENSG00000000457            226            234
# 10 ENSG00000000460             52             56
# 11 ENSG00000000938              0              0
 
# compute correlation coefficient between both datasets
cor(data$SRR1039509_all, data$SRR1039509_gtf, 
    use="complete.obs", method="kendall")
 
# [1] 0.9230123
 
# correlate with more control
cor.test(data$SRR1039509_all, data$SRR1039509_gtf,
         alternative = "two.sided",
         method = "kendall",
         exact = NULL, 
         conf.level = 0.95)
 
# Kendall's rank correlation tau
# 
# data:  data$SRR1039509_all and data$SRR1039509_gtf
# z = 266.8679, p-value < 2.2e-16
# alternative hypothesis: true tau is not equal to 0
# sample estimates:
#   tau 
# 0.9230123 
 
# plot scatterplot
png(file="SRR1039509-all_vs_gtf-counts.png")
 
plot(log2(data$SRR1039509_all) ~ log2(data$SRR1039509_gtf),
     xlab="log2-GTF counts - SRR1039509",
     ylab="log2-ALL counts - SRR1039509",
     pch=20,
     cex=0.5
  )
 
# add lines
abline(0, 1, col="red", lty=1, lwd=2)
abline(h=log2(10), col="grey1", lty=2)
abline(v=log2(10), col="grey1", lty=2)
abline(h=log2(20), col="grey2", lty=2)
abline(v=log2(20), col="grey2", lty=2)
dev.off()

This supports excluding counts lower than a certain value (eg 10) from your analyses, that will introduce more noise that reliable information in your results

Combine individual HTSeq files from the 'all' mapping series

The HTSeq program has extracted counts from each BAM file and created one corresponding text file with column#1 providing ENSG-IDs and column 1 the raw counts. We need to merge all these tables (within each experiment) into a single table with a leading identifier column followed by one labelled column for each replicate/condition. This is easily done using R as detailed below. The final results are saved to two text file (one for 'gtf' and one for 'all' results) that can be fed to differential expression analysis tools.

Merging all HTSeq count files into a R object can be done in different ways. Packages like DESeq have specific functions (see DESeqDataSetFromHTSeqCount) to do the merge directly from the text count files but we prefer to present here a pure R code to aggregate the files that can be recycled to aggregate other text files you may obtain while performing high throughput screens.

All HTSeq results are grouped in a common folder and can be batch loaded into RStudio with the following commands. Please adapt the location to your own directory structure if you reuse this code.

htseq-combine_all.R script

#!/usr/bin/RScript
# htseq-combine_all.R
# copy this code to RStudio and adapt file locations to match yours
 
# Take 'all' htseq-count results and melt them in to one big dataframe
 
# where are we?
basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015"
setwd(basedir)
 
cntdir <- paste(basedir, "htseq_counts", sep="/")
pat <- "_all_counts.txt"
tophat.all <- list.files(path = cntdir,
	pattern = pat,
	all.files = TRUE,
	recursive = FALSE,
	ignore.case = FALSE,
	include.dirs = FALSE)
 
# we choose the 'all' series
myfiles <- tophat.all
DT <- list()
 
# read each file as array element of DT and rename the last 2 cols
# we created a list of single sample tables
for (i in 1:length(myfiles) ) {
	infile = paste(cntdir, myfiles[i], sep = "/")
	DT[[myfiles[i]]] <- read.table(infile, header = F, stringsAsFactors = FALSE)
	cnts <- gsub("(.*)_all_counts.txt", "\\1", myfiles[i])
	colnames(DT[[myfiles[i]]]) <- c("ID", cnts)
}
 
# merge all elements based on first ID columns
data <- DT[[myfiles[1]]]
 
# inspect
head(data)
 
# ID SRR1039508
# 1 ENSG00000000003        667
# 2 ENSG00000000005          0
# 3 ENSG00000000419        430
# 4 ENSG00000000457        256
# 5 ENSG00000000460         56
# 6 ENSG00000000938          0
 
# we now add each other table with the ID column as key
for (i in 2:length(myfiles)) {
	y <- DT[[myfiles[i]]]
	z <- merge(data, y, by = c("ID"))
	data <- z
}
 
# ID column becomes rownames
rownames(data) <- data$ID
data <- data[,-1]
 
## add total counts per sample
data <- rbind(data, tot.counts=colSums(data))
 
# inspect and look at the top row names!
head(data)
 
# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513
# __alignment_not_unique    6029848    5535220    6917679    5736110    7174795    3829823
# __ambiguous                609790     567580     563265     582869     711348     449909
# __no_feature              2048439    1871251    2186546    2036064    2564408    1387796
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# ENSG00000000003               667        434        721        444        862        401
# SRR1039514 SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519
# __alignment_not_unique    8865980    5533574    7349997    7829016    7357178    5435207
# __ambiguous               1021469     607559     688966     890454     809693     575376
# __no_feature              3464421    2230297    2530061    3020635    2678110    1990138
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# ENSG00000000003               958        853       1133       1050       1087       1104
# SRR1039520 SRR1039521 SRR1039522 SRR1039523
# __alignment_not_unique    5019316    5159442    6941580    7078894
# __ambiguous                554414     613151     709211     818199
# __no_feature              1939908    1931496    2641504    2825142
# __not_aligned                   0          0          0          0
# __too_low_aQual                 0          0          0          0
# ENSG00000000003               750        562       1075        826
 
tail(data)
 
# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513 SRR1039514
# ENSG00000273487          3          4          4          6          0          9          5
# ENSG00000273488          5          3          4          3          5          3         10
# ENSG00000273489          0          0          1          0          0          2          0
# ENSG00000273492          0          0          0          0          1          0          2
# ENSG00000273493          0          0          0          0          0          0          0
# tot.counts        26792115   24519985   27416297   25783685   33081391   19381676   45865162
# SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519 SRR1039520 SRR1039521
# ENSG00000273487          4          6          4          5          3          3          8
# ENSG00000273488          4          3         10          6          6          6         11
# ENSG00000273489          2          1          0          2          1          0          0
# ENSG00000273492          1          0          0          1          0          0          0
# ENSG00000273493          0          0          0          0          0          0          0
# tot.counts        27862053   32316085   39563928   35371127   25763223   24653719   26874854
# SRR1039522 SRR1039523
# ENSG00000273487          6          9
# ENSG00000273488          7          6
# ENSG00000273489          3          1
# ENSG00000273492          1          0
# ENSG00000273493          0          1
# tot.counts        32888531   35869342
 
####################################
# take summary rows to a new table
# ( not starting with ENS with invert=TRUE )
 
# transpose table for readability
data.all.summary <- data[grep("^ENS", rownames(data), perl=TRUE, invert=TRUE), ]
 
# review
data.all.summary
 
# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513
# __alignment_not_unique    6029848    5535220    6917679    5736110    7174795    3829823
# __ambiguous                609790     567580     563265     582869     711348     449909
# __no_feature              2048439    1871251    2186546    2036064    2564408    1387796
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# tot.counts               26792115   24519985   27416297   25783685   33081391   19381676
# SRR1039514 SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519
# __alignment_not_unique    8865980    5533574    7349997    7829016    7357178    5435207
# __ambiguous               1021469     607559     688966     890454     809693     575376
# __no_feature              3464421    2230297    2530061    3020635    2678110    1990138
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# tot.counts               45865162   27862053   32316085   39563928   35371127   25763223
# SRR1039520 SRR1039521 SRR1039522 SRR1039523
# __alignment_not_unique    5019316    5159442    6941580    7078894
# __ambiguous                554414     613151     709211     818199
# __no_feature              1939908    1931496    2641504    2825142
# __not_aligned                   0          0          0          0
# __too_low_aQual                 0          0          0          0
# tot.counts               24653719   26874854   32888531   35869342
 
# transpose table
t(data.all.summary)
 
# write summary to file
write.csv(data.all.summary, file = "chr22-htseq_counts_all-summary.csv")
 
####################################
# take all data rows to a new table
 
data.all <- data[grep("^ENS", rownames(data), perl=TRUE, invert=FALSE), ]
 
# final merged table
head(data.all, 3)
 
# write data to file
write.csv(data.all, file = "chr22-htseq_counts_all.csv")
 
# cleanup intermediate objects
rm(y, z, i, DT)

The resulting 16-column by 57773 rows data table starts like shown below

# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513 SRR1039514
# ENSG00000000003 667 434 721 444 862 401 958
# ENSG00000000005 0 0 0 0 0 0 1
# ENSG00000000419 430 488 417 477 556 334 796
# SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519 SRR1039520 SRR1039521
# ENSG00000000003 853 1133 1050 1087 1104 750 562
# ENSG00000000005 0 0 0 1 0 0 0
# ENSG00000000419 409 529 719 669 465 378 468
# SRR1039522 SRR1039523
# ENSG00000000003 1075 826
# ENSG00000000005 0 0
# ENSG00000000419 487 622

A comparable script can be used if you have computed counts from gtf tophat mappings

htseq-combine_gtf.R script

#!/usr/bin/RScript
# htseq-combine_gtf.R
# copy this code to RStudio and adapt file locations to match yours
 
# Take 'gtf' htseq-count results and melt them in to one big dataframe
 
# where are we?
basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015"
setwd(basedir)
 
cntdir <- paste(basedir, "htseq_counts", sep="/")
pat <- "_gtf_counts.txt"
tophat.gtf <- list.files(path = cntdir,
	pattern = pat,
	gtf.files = TRUE,
	recursive = FALSE,
	ignore.case = FALSE,
	include.dirs = FALSE)
 
# we choose the 'gtf' series
myfiles <- tophat.gtf
DT <- list()
 
# read each file as array element of DT and rename the last 2 cols
# we created a list of single sample tables
for (i in 1:length(myfiles) ) {
	infile = paste(cntdir, myfiles[i], sep = "/")
	DT[[myfiles[i]]] <- read.table(infile, header = F, stringsAsFactors = FALSE)
	cnts <- gsub("(.*)_gtf_counts.txt", "\\1", myfiles[i])
	colnames(DT[[myfiles[i]]]) <- c("ID", cnts)
}
 
# merge gtf elements based on first ID columns
data <- DT[[myfiles[1]]]
 
# inspect
head(data)
 
# we now add each other table with the ID column as key
for (i in 2:length(myfiles)) {
	y <- DT[[myfiles[i]]]
	z <- merge(data, y, by = c("ID"))
	data <- z
}
 
# ID column becomes rownames
rownames(data) <- data$ID
data <- data[,-1]
 
## add total counts per sample
data <- rbind(data, tot.counts=colSums(data))
 
# inspect and look at the top row names!
head(data)
tail(data)
 
####################################
# take summary rows to a new table
# ( not starting with ENS with invert=TRUE )
 
# transpose table for readability
data.gtf.summary <- data[grep("^ENS", rownames(data), perl=TRUE, invert=TRUE), ]
 
# review
data.gtf.summary
 
# transpose table
t(data.gtf.summary)
 
# write summary to file
write.csv(data.gtf.summary, file = "chr22-htseq_counts_gtf-summary.csv")
 
####################################
# take gtf data rows to a new table
 
data.gtf <- data[grep("^ENS", rownames(data), perl=TRUE, invert=FALSE), ]
 
# final merged table
head(data.gtf, 3)
 
# write data to file
write.csv(data.gtf, file = "chr22-htseq_counts_gtf.csv")
 
# cleanup intermediate objects
rm(y, z, i, DT)

Create a table describing the samples

A metadata table that will be precious for the next exercises was also created with a direct download of online resources as detailed below.

create_metadata-table.R script

#!/usr/bin/RScript
# create_metadata-table.R
## create a metadata table that will be used later
 
# this table will contain information required to split the samples
# into groups and define contrasts for the differential expression analysis.
# You may build such table using your favorite spreadsheet editor or
# let R make it for you as shown below
# this code is saved under 'build-metadata.R'
 
library("stringr")
 
PID <- "SRP033351"
ena.url <- paste("http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=",
                 PID,
                 "&result=read_run",
                 "&fields=run_accession,read_count",
                 sep="")
 
# assemble into a data.frame
metadata <- as.data.frame(read.table(url(ena.url), header=TRUE, sep="\t"))
 
# add missing information about cells & treatment
samples=c("N61311_untreated",
          "N61311_Dex",
          "N61311_Alb",
          "N61311_Alb_Dex",
          "N052611_untreated",
          "N052611_Dex",
          "N052611_Alb",
          "N052611_Alb_Dex",
          "N080611_untreated",
          "N080611_Dex",
          "N080611_Alb",
          "N080611_Alb_Dex",
          "N061011_untreated",
          "N061011_Dex",
          "N061011_Alb",
          "N061011_Alb_Dex")
metadata$samples <- samples
 
# sample the first line
t(metadata[1,])
 
# add new columns
metadata$cells <- str_extract(metadata$sample, "\\b[A-Za-z0-9]+")
metadata$treatment <- mapply(sub,paste(metadata$cells,"_",sep=""), "", metadata$sample)
 
# save to file for reuse
write.table(metadata, file="GSE52778_metadata.txt", row.names=F, sep="\t", quote=FALSE)
 
# view table
metadata

# run_accession read_count samples cells treatment
1 SRR1039508 22935521 N61311_untreated N61311 untreated
2 SRR1039509 21155707 N61311_Dex N61311 Dex
3 SRR1039510 22852619 N61311_Alb N61311 Alb
4 SRR1039511 21938637 N61311_Alb_Dex N61311 Alb_Dex
5 SRR1039512 28136282 N052611_untreated N052611 untreated
6 SRR1039513 43356464 N052611_Dex N052611 Dex
7 SRR1039514 40021727 N052611_Alb N052611 Alb
8 SRR1039515 29454748 N052611_Alb_Dex N052611 Alb_Dex
9 SRR1039516 30043024 N080611_untreated N080611 untreated
10 SRR1039517 34298260 N080611_Dex N080611 Dex
11 SRR1039518 30441200 N080611_Alb N080611 Alb
12 SRR1039519 31533740 N080611_Alb_Dex N080611 Alb_Dex
13 SRR1039520 34575286 N061011_untreated N061011 untreated
14 SRR1039521 41152075 N061011_Dex N061011 Dex
15 SRR1039522 28406812 N061011_Alb N061011 Alb
16 SRR1039523 31099538 N061011_Alb_Dex N061011 Alb_Dex

download exercise files

Download exercise files here.

Use the right application to open the files present in ex4-files

References:

↑ http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

[ Main_Page | Hands-on_introduction_to_NGS_RNASeq_DE_analysis | NGS_RNASeq_DE_Exercise.3 | NGS_RNASeq_DE_Exercise.5 ]

[1] ttp://www-huber.embl.de/users/anders/HTSeq/doc/count.html

[1]

NGS RNASeq DE Exercise.4

Contents

Introduction

Compute gene counts from BAM data

Install R and bioconductor

Compare raw HTSeq counts from tophat 'all' and gtf' mappings

Combine individual HTSeq files from the 'all' mapping series

Create a table describing the samples

download exercise files

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox