NGS RNASeq DE Exercise.4

From BITS wiki
Jump to: navigation, search

Count reads at gene level using HTSeq

[ Main_Page | Hands-on_introduction_to_NGS_RNASeq_DE_analysis | NGS_RNASeq_DE_Exercise.3 | NGS_RNASeq_DE_Exercise.5 ]


Introduction

In order to compute differential expression between sample groups, we first need to convert mapping results to gene level counts for each sample and merge all resulting count tables into a global table to be fed to R-bioconductor packages for statistical analysis. The counting can also be done in R using various packages but will be slower due to the lesser support for multi-threading and parallelization as compared to command-line.

 

Compute gene counts from BAM data

The following script perform gene count from each BAM file and saves the results to a simple two-column text file for each sample. We use the popular HTSeq-count executable [1] to compute gene counts. Note that other packages in R (Rsamtools, countOverlaps from IRanges, summarizeOverlaps from GenomicRanges) as well as standalone software like Qualimap propose methods for this aim.

HTSeq counts transcriptome hits from BAM data in different 'modes' proposed to the user and detailed below.

count_modes.png

Handicon.png We will use in our script the 'intersection-strict' method

More information about this universally used package can be found on the developer web site

bam_htseq-count.sh script

#!/bin/bash
# bam_htseq-count.sh
 
# run HTSeq on single BAM sample
# count only uniquely mapped mappings & alignQ > 10
# requires HTSeq (0.6.0+)
## requires sam data sorted by 'queryname' as in our markdup set
 
# help: batch apply this script to all bams with
## for b in tophat2.0.13_results/markdup_bams/*_all-tophat-q.bam; do \
##   echo "# analyzing ${b}" \
##   scripts/bam_htseq-count.sh $b\
##   done
 
# or even better run 5 jobs in 'parallel' if you have power
# find tophat2.0.13_results/markdup_bams/ -name "*_all_mdup.bam" | \
#	parallel --no-notice --workdir . --tmpdir ./tmp -j 5 \
#	scripts/bam_htseq-count.sh {}
 
# takes a single bam file from $1
if [ $# -lt 1 ]
then
echo "Usage: ${0##*/} <bam file>
	eg: tophat2.0.13-SRR1039509_results/markdup_bams/SRR1039509_all_mdup.bam"
exit
fi
# argument#1 gives the bam file
bam=$1
name=$(basename ${bam} ".bam")
 
# test if bam is queryname-sorted
order="$(samtools view -H ${bam} | head -1 | awk '{split($3, ord, ":"); print ord[2]}')"
echo "# order is $order"
echo
 
if [ "$order" = "coordinate" ]; then
	echo "# sorting in queryname order"
	# redefine qbam
	qbam=${bam//.bam/-q.bam}
	cmd="java -jar $PICARD/picard.jar SortSam \
		I=${bam} \
		O=${bam//.bam/-q.bam} \
		SO=queryname"
	eval ${cmd} && bam=${bam//.bam/-q.bam}
fi
 
# create output folder
outfolder="htseq_counts"
mkdir -p ${outfolder}
 
# select transcript annotation model
gtffile=$BOWTIE2_INDEXES/hg19_ensGene.gtf
# other options would be
#gtffile=$BOWTIE2_INDEXES/chr22-hg19_ensGene.gtf
#gtffile=$BOWTIE2_INDEXES/hg19_refGene.gtf
 
# decompress bam with 'samtools' and use default parameters
# possible IDs are:
## transcript_name (gene symbol)
## transcript_id (ENST-ID)
## exon_id (ENSE-ID)
## gene_id (ENSG-ID:default) ++
 
cmd="samtools view ${bam} | \
  	htseq-count \
  	-m intersection-strict \
  	-s no \
  	-a 10 \
  	-t exon \
  	-i gene_id \
  	- \
  	${gtffile} \
  	> ${outfolder}/${name}_counts.txt \
 	2> >(tee -a ${outfolder}/${name}_errorlog.txt >&2)"
 
echo "# ${cmd}"
eval ${cmd}
echo
 
# compute simple stats on ENSG transcripts with coverage
# use non-null data from col#2 when col#1 contains a valid "ENSG ID"
echo "# simple coverage stats for transcripts with read counts"
echo "# ${bam}"
awk 'BEGIN{FS="\t"; OFS="\t"}{if($1~/ENSG/ && $2>0) print $2}' \
	${outfolder}/${name}_counts.txt | qstats -s

 

Install R and bioconductor

This section does not need to be done for BITS laptops but is required if you run the code on your own machine; We do not provide all details on steps required to install the require base software and dependencies as they differ depending on your operating system. Please read the documentation behind each link/

  • install R on your computer, if you rinstalled R version is outdated (eg installed using apt-get on ubuntu), consider removing it and reinstalling the latest stable version from CRAN.
  • install the latest RStudio version from the RStudio site
  • launch RStudio and run the following code to configure the basic bioconductor installation and update outdated packages in your installed version.

Install and Update RStudio and R packages

## Change default Bioconductor and CRAN mirrors
## this part will be run interactively (not run here)
#chooseBioCmirror()
#chooseCRANmirror()
 
## identify where R looks for packages
 
# locate library folder
.libPaths()
 
## make sure that you have writing rights in this place 
##          as this will allow you installing packages for all users of your machine
 
# see what is currently installed
library()
 
###### Install bioconductor (takes some time and need storage and internet speed) #####
 
# If you don't have the BiocInstaller package installed, you can quickly install and load it as follows:
source("http://bioconductor.org/biocLite.R")
 
# then update install the bioconductor base packages 
#     as well as update all existing packages with the very easy command
biocLite()
 
## upgrade bioconductor might be relevant if you have a very old version running from ages
biocLite("BiocUpgrade")
 
# next time you want to update packages, just come back and run
source("http://bioconductor.org/biocLite.R")
biocLite()
 
# when you need a new package and know its name, most often this will do
biocLite("<the name of the new package>")
 
## related to this training, you can access packages and their documentation within RStudio
source("http://bioconductor.org/biocLite.R")
 
# a full workflow related to the extended DESeq2 demo
source("http://bioconductor.org/workflows.R")
workflowInstall("rnaseqGene")
library("rnaseqGene")
vignette("rnaseqGene")
 
# THE package
library("DESeq2")
vignette("DESeq2")
browseVignettes("DESeq2")
 
# data used by DESeq2
library("airway")
browseVignettes("airway")
 
# EdgeR
library("edgeR")
vignette("edgeR")
###### happy R'ing

Handicon.png RStudio can generate PDF, HTML, and Word documents from your code that also include figures and more

This is a really cool feature to write reports and share them. You will need for his a couple more dependencies at system level and will find their name when trying and failing to compile your first document (Dr G will help you fixing these annoying first steps)

 

Compare raw HTSeq counts from tophat 'all' and gtf' mappings

The Raw HTSeq counts obtained for the sample (SRR1039509) were used to evaluate the effect of mapping reads without or with guidance from a transcriptome file. HTSeq conversion of both BAM files returned counts distributions summarized with qstats as follows:

A typical HTSeq count file looks like this

ENSG00000000003 434
ENSG00000000005 0
ENSG00000000419 488
ENSG00000000457 226
ENSG00000000460 52
ENSG00000000938 0
ENSG00000000971 3159
ENSG00000001036 1524
ENSG00000001084 386
ENSG00000001167 257
...

Summary lines reported for each HTSeq run

SRR1039509 all-counts gtf-counts
no_feature 1871251 1212702
ambiguous 567580 691622
too_low_aQual 0 0
not_aligned 0 0
alignment_not_unique 5535220 2553851

Descriptive statistics for non-null counts obtained in the two tophat runs

# consider lines starting with ENSG
# take column #2 
# ignore count=0
# compute descriptive stats using qstats
awk 'BEGIN{FS="\t"; OFS="\t"}{if($1~/ENSG/ && $2>0) print $2}' \
	${outfolder}/chr22-${name}_counts.txt | qstats -s
SRR1039509 counts for 'all' counts for 'gtf'
Min. 1 1
1st Qu. 5 4
Median 78 79
Mean 755.246 832.739
3rd Qu. 527 557
Max. 210019 214935
Range 210018 214934
Std Dev. 4181.34 4554.83
Length 21908 21491

The data from both methods was used to plot 'all' vs 'gtf' counts. If both methods were identical, all points should be on the red diagonal. The next figure shows that the results are highly similar but not identical and that the 'all' counts presents slightly more variability. Dotted lines at 10 or 20-counts (3.32 or 4.32 in log2) delimit the lower region of low precision in which results differ the most.

compare-htseq_counts.R script

#!/usr/bin/RScript
# compare-htseq_counts.R
# copy this code to RStudio and adapt file locations to match yours
 
# basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015/SRR1039509_data/tophat2.0.13-SRR1039509_results/htseq_counts"
basedir <- <edit here for your machine>
setwd(basedir)
 
# Take htseq-count for 'all' and 'gtf' and make scatter-plot
# use sample SRR1039509 for plot
SRR1039509_all <- read.delim("SRR1039509_all_counts.txt", header=F)
SRR1039509_gtf <- read.delim("SRR1039509_gtf_counts.txt", header=F)
 
data <- merge(SRR1039509_all, SRR1039509_gtf, by='V1')
colnames(data) <- c("ID", "SRR1039509_all", "SRR1039509_gtf")
head(data, 10)
 
# ID SRR1039509_all SRR1039509_gtf
# 1  __alignment_not_unique        5535220        2553851
# 2             __ambiguous         567580         691622
# 3            __no_feature        1871251        1212702
# 4           __not_aligned              0              0
# 5         __too_low_aQual              0              0
# 6         ENSG00000000003            434            440
# 7         ENSG00000000005              0              0
# 8         ENSG00000000419            488            506
# 9         ENSG00000000457            226            234
# 10        ENSG00000000460             52             56
 
# extract summary lines
data.summary <- data[grep("^ENS", data$ID, perl=TRUE, invert=TRUE), ]
rownames(data.summary) <- data.summary$ID
data.summary <- data.all.summary[,-1]
 
data.summary
 
# ID SRR1039509_all SRR1039509_gtf
# __alignment_not_unique __alignment_not_unique        5535220        2553851
# __ambiguous                       __ambiguous         567580         691622
# __no_feature                     __no_feature        1871251        1212702
# __not_aligned                   __not_aligned              0              0
# __too_low_aQual               __too_low_aQual              0              0
 
# extract gene data lines
data <- data[grep("^ENS", data$ID, perl=TRUE, invert=FALSE), ]
head(data)
 
# ID SRR1039509_all SRR1039509_gtf
# 6  ENSG00000000003            434            440
# 7  ENSG00000000005              0              0
# 8  ENSG00000000419            488            506
# 9  ENSG00000000457            226            234
# 10 ENSG00000000460             52             56
# 11 ENSG00000000938              0              0
 
# compute correlation coefficient between both datasets
cor(data$SRR1039509_all, data$SRR1039509_gtf, 
    use="complete.obs", method="kendall")
 
# [1] 0.9230123
 
# correlate with more control
cor.test(data$SRR1039509_all, data$SRR1039509_gtf,
         alternative = "two.sided",
         method = "kendall",
         exact = NULL, 
         conf.level = 0.95)
 
# Kendall's rank correlation tau
# 
# data:  data$SRR1039509_all and data$SRR1039509_gtf
# z = 266.8679, p-value < 2.2e-16
# alternative hypothesis: true tau is not equal to 0
# sample estimates:
#   tau 
# 0.9230123 
 
# plot scatterplot
png(file="SRR1039509-all_vs_gtf-counts.png")
 
plot(log2(data$SRR1039509_all) ~ log2(data$SRR1039509_gtf),
     xlab="log2-GTF counts - SRR1039509",
     ylab="log2-ALL counts - SRR1039509",
     pch=20,
     cex=0.5
  )
 
# add lines
abline(0, 1, col="red", lty=1, lwd=2)
abline(h=log2(10), col="grey1", lty=2)
abline(v=log2(10), col="grey1", lty=2)
abline(h=log2(20), col="grey2", lty=2)
abline(v=log2(20), col="grey2", lty=2)
dev.off()
SRR1039509-all_vs_gtf-counts.png

Handicon.png This supports excluding counts lower than a certain value (eg 10) from your analyses, that will introduce more noise that reliable information in your results

 

Combine individual HTSeq files from the 'all' mapping series

The HTSeq program has extracted counts from each BAM file and created one corresponding text file with column#1 providing ENSG-IDs and column 1 the raw counts. We need to merge all these tables (within each experiment) into a single table with a leading identifier column followed by one labelled column for each replicate/condition. This is easily done using R as detailed below. The final results are saved to two text file (one for 'gtf' and one for 'all' results) that can be fed to differential expression analysis tools.

Merging all HTSeq count files into a R object can be done in different ways. Packages like DESeq have specific functions (see DESeqDataSetFromHTSeqCount) to do the merge directly from the text count files but we prefer to present here a pure R code to aggregate the files that can be recycled to aggregate other text files you may obtain while performing high throughput screens.

All HTSeq results are grouped in a common folder and can be batch loaded into RStudio with the following commands. Please adapt the location to your own directory structure if you reuse this code.

htseq-combine_all.R script

#!/usr/bin/RScript
# htseq-combine_all.R
# copy this code to RStudio and adapt file locations to match yours
 
# Take 'all' htseq-count results and melt them in to one big dataframe
 
# where are we?
basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015"
setwd(basedir)
 
cntdir <- paste(basedir, "htseq_counts", sep="/")
pat <- "_all_counts.txt"
tophat.all <- list.files(path = cntdir,
	pattern = pat,
	all.files = TRUE,
	recursive = FALSE,
	ignore.case = FALSE,
	include.dirs = FALSE)
 
# we choose the 'all' series
myfiles <- tophat.all
DT <- list()
 
# read each file as array element of DT and rename the last 2 cols
# we created a list of single sample tables
for (i in 1:length(myfiles) ) {
	infile = paste(cntdir, myfiles[i], sep = "/")
	DT[[myfiles[i]]] <- read.table(infile, header = F, stringsAsFactors = FALSE)
	cnts <- gsub("(.*)_all_counts.txt", "\\1", myfiles[i])
	colnames(DT[[myfiles[i]]]) <- c("ID", cnts)
}
 
# merge all elements based on first ID columns
data <- DT[[myfiles[1]]]
 
# inspect
head(data)
 
# ID SRR1039508
# 1 ENSG00000000003        667
# 2 ENSG00000000005          0
# 3 ENSG00000000419        430
# 4 ENSG00000000457        256
# 5 ENSG00000000460         56
# 6 ENSG00000000938          0
 
# we now add each other table with the ID column as key
for (i in 2:length(myfiles)) {
	y <- DT[[myfiles[i]]]
	z <- merge(data, y, by = c("ID"))
	data <- z
}
 
# ID column becomes rownames
rownames(data) <- data$ID
data <- data[,-1]
 
## add total counts per sample
data <- rbind(data, tot.counts=colSums(data))
 
# inspect and look at the top row names!
head(data)
 
# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513
# __alignment_not_unique    6029848    5535220    6917679    5736110    7174795    3829823
# __ambiguous                609790     567580     563265     582869     711348     449909
# __no_feature              2048439    1871251    2186546    2036064    2564408    1387796
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# ENSG00000000003               667        434        721        444        862        401
# SRR1039514 SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519
# __alignment_not_unique    8865980    5533574    7349997    7829016    7357178    5435207
# __ambiguous               1021469     607559     688966     890454     809693     575376
# __no_feature              3464421    2230297    2530061    3020635    2678110    1990138
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# ENSG00000000003               958        853       1133       1050       1087       1104
# SRR1039520 SRR1039521 SRR1039522 SRR1039523
# __alignment_not_unique    5019316    5159442    6941580    7078894
# __ambiguous                554414     613151     709211     818199
# __no_feature              1939908    1931496    2641504    2825142
# __not_aligned                   0          0          0          0
# __too_low_aQual                 0          0          0          0
# ENSG00000000003               750        562       1075        826
 
tail(data)
 
# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513 SRR1039514
# ENSG00000273487          3          4          4          6          0          9          5
# ENSG00000273488          5          3          4          3          5          3         10
# ENSG00000273489          0          0          1          0          0          2          0
# ENSG00000273492          0          0          0          0          1          0          2
# ENSG00000273493          0          0          0          0          0          0          0
# tot.counts        26792115   24519985   27416297   25783685   33081391   19381676   45865162
# SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519 SRR1039520 SRR1039521
# ENSG00000273487          4          6          4          5          3          3          8
# ENSG00000273488          4          3         10          6          6          6         11
# ENSG00000273489          2          1          0          2          1          0          0
# ENSG00000273492          1          0          0          1          0          0          0
# ENSG00000273493          0          0          0          0          0          0          0
# tot.counts        27862053   32316085   39563928   35371127   25763223   24653719   26874854
# SRR1039522 SRR1039523
# ENSG00000273487          6          9
# ENSG00000273488          7          6
# ENSG00000273489          3          1
# ENSG00000273492          1          0
# ENSG00000273493          0          1
# tot.counts        32888531   35869342
 
####################################
# take summary rows to a new table
# ( not starting with ENS with invert=TRUE )
 
# transpose table for readability
data.all.summary <- data[grep("^ENS", rownames(data), perl=TRUE, invert=TRUE), ]
 
# review
data.all.summary
 
# SRR1039508 SRR1039509 SRR1039510 SRR1039511 SRR1039512 SRR1039513
# __alignment_not_unique    6029848    5535220    6917679    5736110    7174795    3829823
# __ambiguous                609790     567580     563265     582869     711348     449909
# __no_feature              2048439    1871251    2186546    2036064    2564408    1387796
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# tot.counts               26792115   24519985   27416297   25783685   33081391   19381676
# SRR1039514 SRR1039515 SRR1039516 SRR1039517 SRR1039518 SRR1039519
# __alignment_not_unique    8865980    5533574    7349997    7829016    7357178    5435207
# __ambiguous               1021469     607559     688966     890454     809693     575376
# __no_feature              3464421    2230297    2530061    3020635    2678110    1990138
# __not_aligned                   0          0          0          0          0          0
# __too_low_aQual                 0          0          0          0          0          0
# tot.counts               45865162   27862053   32316085   39563928   35371127   25763223
# SRR1039520 SRR1039521 SRR1039522 SRR1039523
# __alignment_not_unique    5019316    5159442    6941580    7078894
# __ambiguous                554414     613151     709211     818199
# __no_feature              1939908    1931496    2641504    2825142
# __not_aligned                   0          0          0          0
# __too_low_aQual                 0          0          0          0
# tot.counts               24653719   26874854   32888531   35869342
 
# transpose table
t(data.all.summary)
 
# write summary to file
write.csv(data.all.summary, file = "chr22-htseq_counts_all-summary.csv")
 
####################################
# take all data rows to a new table
 
data.all <- data[grep("^ENS", rownames(data), perl=TRUE, invert=FALSE), ]
 
# final merged table
head(data.all, 3)
 
# write data to file
write.csv(data.all, file = "chr22-htseq_counts_all.csv")
 
# cleanup intermediate objects
rm(y, z, i, DT)

The resulting 16-column by 57773 rows data table starts like shown below

#                  SRR1039508  SRR1039509  SRR1039510  SRR1039511  SRR1039512  SRR1039513  SRR1039514
# ENSG00000000003  667         434         721         444         862         401         958
# ENSG00000000005  0           0           0           0           0           0           1
# ENSG00000000419  430         488         417         477         556         334         796
#                  SRR1039515  SRR1039516  SRR1039517  SRR1039518  SRR1039519  SRR1039520  SRR1039521
# ENSG00000000003  853         1133        1050        1087        1104        750         562
# ENSG00000000005  0           0           0           1           0           0           0
# ENSG00000000419  409         529         719         669         465         378         468
#                  SRR1039522  SRR1039523
# ENSG00000000003  1075        826
# ENSG00000000005  0           0
# ENSG00000000419  487         622

A comparable script can be used if you have computed counts from gtf tophat mappings

htseq-combine_gtf.R script

#!/usr/bin/RScript
# htseq-combine_gtf.R
# copy this code to RStudio and adapt file locations to match yours
 
# Take 'gtf' htseq-count results and melt them in to one big dataframe
 
# where are we?
basedir <- "/work/TUTORIALS/NGS_RNASeqDE-training2015"
setwd(basedir)
 
cntdir <- paste(basedir, "htseq_counts", sep="/")
pat <- "_gtf_counts.txt"
tophat.gtf <- list.files(path = cntdir,
	pattern = pat,
	gtf.files = TRUE,
	recursive = FALSE,
	ignore.case = FALSE,
	include.dirs = FALSE)
 
# we choose the 'gtf' series
myfiles <- tophat.gtf
DT <- list()
 
# read each file as array element of DT and rename the last 2 cols
# we created a list of single sample tables
for (i in 1:length(myfiles) ) {
	infile = paste(cntdir, myfiles[i], sep = "/")
	DT[[myfiles[i]]] <- read.table(infile, header = F, stringsAsFactors = FALSE)
	cnts <- gsub("(.*)_gtf_counts.txt", "\\1", myfiles[i])
	colnames(DT[[myfiles[i]]]) <- c("ID", cnts)
}
 
# merge gtf elements based on first ID columns
data <- DT[[myfiles[1]]]
 
# inspect
head(data)
 
# we now add each other table with the ID column as key
for (i in 2:length(myfiles)) {
	y <- DT[[myfiles[i]]]
	z <- merge(data, y, by = c("ID"))
	data <- z
}
 
# ID column becomes rownames
rownames(data) <- data$ID
data <- data[,-1]
 
## add total counts per sample
data <- rbind(data, tot.counts=colSums(data))
 
# inspect and look at the top row names!
head(data)
tail(data)
 
####################################
# take summary rows to a new table
# ( not starting with ENS with invert=TRUE )
 
# transpose table for readability
data.gtf.summary <- data[grep("^ENS", rownames(data), perl=TRUE, invert=TRUE), ]
 
# review
data.gtf.summary
 
# transpose table
t(data.gtf.summary)
 
# write summary to file
write.csv(data.gtf.summary, file = "chr22-htseq_counts_gtf-summary.csv")
 
####################################
# take gtf data rows to a new table
 
data.gtf <- data[grep("^ENS", rownames(data), perl=TRUE, invert=FALSE), ]
 
# final merged table
head(data.gtf, 3)
 
# write data to file
write.csv(data.gtf, file = "chr22-htseq_counts_gtf.csv")
 
# cleanup intermediate objects
rm(y, z, i, DT)

Create a table describing the samples

A metadata table that will be precious for the next exercises was also created with a direct download of online resources as detailed below.

create_metadata-table.R script

#!/usr/bin/RScript
# create_metadata-table.R
## create a metadata table that will be used later
 
# this table will contain information required to split the samples
# into groups and define contrasts for the differential expression analysis.
# You may build such table using your favorite spreadsheet editor or
# let R make it for you as shown below
# this code is saved under 'build-metadata.R'
 
library("stringr")
 
PID <- "SRP033351"
ena.url <- paste("http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=",
                 PID,
                 "&result=read_run",
                 "&fields=run_accession,read_count",
                 sep="")
 
# assemble into a data.frame
metadata <- as.data.frame(read.table(url(ena.url), header=TRUE, sep="\t"))
 
# add missing information about cells & treatment
samples=c("N61311_untreated",
          "N61311_Dex",
          "N61311_Alb",
          "N61311_Alb_Dex",
          "N052611_untreated",
          "N052611_Dex",
          "N052611_Alb",
          "N052611_Alb_Dex",
          "N080611_untreated",
          "N080611_Dex",
          "N080611_Alb",
          "N080611_Alb_Dex",
          "N061011_untreated",
          "N061011_Dex",
          "N061011_Alb",
          "N061011_Alb_Dex")
metadata$samples <- samples
 
# sample the first line
t(metadata[1,])
 
# add new columns
metadata$cells <- str_extract(metadata$sample, "\\b[A-Za-z0-9]+")
metadata$treatment <- mapply(sub,paste(metadata$cells,"_",sep=""), "", metadata$sample)
 
# save to file for reuse
write.table(metadata, file="GSE52778_metadata.txt", row.names=F, sep="\t", quote=FALSE)
 
# view table
metadata
#   run_accession  read_count  samples            cells    treatment
1   SRR1039508     22935521    N61311_untreated   N61311   untreated
2   SRR1039509     21155707    N61311_Dex         N61311   Dex
3   SRR1039510     22852619    N61311_Alb         N61311   Alb
4   SRR1039511     21938637    N61311_Alb_Dex     N61311   Alb_Dex
5   SRR1039512     28136282    N052611_untreated  N052611  untreated
6   SRR1039513     43356464    N052611_Dex        N052611  Dex
7   SRR1039514     40021727    N052611_Alb        N052611  Alb
8   SRR1039515     29454748    N052611_Alb_Dex    N052611  Alb_Dex
9   SRR1039516     30043024    N080611_untreated  N080611  untreated
10  SRR1039517     34298260    N080611_Dex        N080611  Dex
11  SRR1039518     30441200    N080611_Alb        N080611  Alb
12  SRR1039519     31533740    N080611_Alb_Dex    N080611  Alb_Dex
13  SRR1039520     34575286    N061011_untreated  N061011  untreated
14  SRR1039521     41152075    N061011_Dex        N061011  Dex
15  SRR1039522     28406812    N061011_Alb        N061011  Alb
16  SRR1039523     31099538    N061011_Alb_Dex    N061011  Alb_Dex

 

download exercise files

Download exercise files here.

Use the right application to open the files present in ex4-files

References:
  1. http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

[ Main_Page | Hands-on_introduction_to_NGS_RNASeq_DE_analysis | NGS_RNASeq_DE_Exercise.3 | NGS_RNASeq_DE_Exercise.5 ]