Parameters of RSEQC

[ Main_Page | NGS data analysis | RNASeq analysis for differential expression in GenePattern ]

It is vital to check the quality of mapping results before proceeding with the RNASeq workflow. The mapping allows to identify issues related to extreme or absent coverage in some regions of the genome as well as to identify duplicate reads, coverage issues and mRNA degradation. We recommend to use a combination of Qualimap (used in the introduction training) and RSeQC (used now in GenePattern) to detect these issues.

RSeQC is a set of tools that can tell a lot about issues with duplicate reads, degradation... The RSeQC documentation can be found here.

This tool can be used to estimate the inner distance distribution between paired-end reads. The inner distance is the length of the fragment - read lengths. The range of inner distances should therefore roughly correspond to the range of fragments lengths that was selected on the gel during library prep.

genebody coverage

This tool scales all transcripts to 100 nt and calculates the number of reads covering each nucleotide position. It generates a plot illustrating the coverage profile along the length of the transcripts. You expect an equal representation of reads over the complete length of the transcripts (unless you used oligodT primers for reverse transcription) so you want no sharp drops or peaks. Low representation at the ends of the transcripts typically points to mRNA degradation.

read distribution

The tool calculates the fraction of reads mapped to coding exons, 5'-UTR exons, 3'-UTR exons, introns and intergenic regions based on the gene models provided. This module roughly reflects the uniformity of coverage; for example, reads are generally over-represented in 3'-UTR for the polyA + RNA-seq protocol.

read duplication

Two strategies were used to determine reads duplication rate:

Sequence based: reads with identical sequence are regarded as duplicates.
Mapping based: reads mapped to the same genomic location are regarded as duplicates. For spliced reads, reads mapped to the same starting position and spliced the same way are regarded as duplicates.

junction saturation

This tool determines if the current sequencing depth is sufficient to perform alternative splicing analyses.

All (annotated) splice junctions should be rediscovered from saturated RNA-Seq data (when coverage is high enough, you should have reads that represent all splice junctions). This module checks for saturation by subsampling 5%, 10%, 15%, ..., 95% of the total alignments in the bam file and counting the number of splice junctions in each subsample. You can change the number and sizes of these subsamples by changing the values of percentile lower bound, percentile upper bound and percentile step size.

The more reads you use the more splice junctions you should find, both known and novel. Your data is considered saturated when you see no clear increase in splice junctions even if you increase the number of reads. In other words at the right end of the plot you want to see a horizontal plateau for the splice junctions.

junction annotation

This tool separates all detected splice junctions into ‘known’, ‘complete novel’ and ‘partial novel’ by comparing them with the reference gene model.

Parameters of RSEQC

Contents

bam stat tool

inner distance tool

genebody coverage

read distribution

read duplication

junction saturation

junction annotation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox