Create a count table in R

From BITS wiki
Jump to: navigation, search

Starting from mapped RNA-seq data (.bam files) and a gene annotation file (.gtf), the BioConductor package GenomicRanges | SummarizeOverlaps will generate a count table required for differential expression analysis.

In order to use this code, please copy paste it into RStudio in a blank 'R Markdown' document.

Feature count extraction
========================================================
```{r loading environment}
library(GenomicRanges)
library(Rsamtools)
library(rtracklayer)
library(GenomicAlignments)
```

In the working directory, a GTF file, used with TopHat2 if available, and the resulting BAM files (ending with .bam)

```{r reading in}
gtf_file=import.gff2("Dmel_gene_annotation_BDGP5_gtf")
gtf_file_gr=as(gtf_file,"GRanges")
bam_file_paths=c("Mapped_reads_ACAGTG_n18_1.bam", "Mapped_reads_CGATGT_n25_1.bam",
   "Mapped_reads_GCCAAT_n18_2.bam", "Mapped_reads_TGACCA_n25_2.bam")
bams=BamFileList(bam_file_paths)
indexBam(bam_file_paths)      
```

```{r counting overlaps}
count_table=summarizeOverlaps(gtf_file_gr,bams, mode="IntersectionStrict", ignore.strand=T)
```