Parameters of RSEQC

From BITS wiki
Jump to: navigation, search

[ Main_Page | NGS data analysis | RNASeq analysis for differential expression in GenePattern ]

It is vital to check the quality of mapping results before proceeding with the RNASeq workflow. The mapping allows to identify issues related to extreme or absent coverage in some regions of the genome as well as to identify duplicate reads, coverage issues and mRNA degradation. We recommend to use a combination of Qualimap (used in the introduction training) and RSeQC (used now in GenePattern) to detect these issues.

RSeQC is a set of tools that can tell a lot about issues with duplicate reads, degradation... The RSeQC documentation can be found here.

bam stat tool

This tool can be used to check the mapping statistics of reads: number of uniquely mapped reads, number reads mapped over a splice junction, number of reads mapped in proper pairs...

The only parameter you need to set is the minimum MAPQ (mapping quality), a phred score reflecting the probability that the read has not been mapped in the right location. The default is 30 (corresponding to 99,9% confidence that the read was mapped in the correct location).

Interpreting the output of bam stat:

  • Optical/PCR duplicates: see our wiki page on duplicates.
  • non primary hits: when a read has been mapped to multiple locations: the best alignment is the primary hit, all other ones are non primary hits.


inner distance tool

This tool can be used to estimate the inner distance distribution between paired-end reads. The inner distance is the length of the fragment - read lengths. The range of inner distances should therefore roughly correspond to the range of fragments lengths that was selected on the gel during library prep.

genebody coverage

This tool scales all transcripts to 100 nt and calculates the number of reads covering each nucleotide position. It generates a plot illustrating the coverage profile along the length of the transcripts. You expect an equal representation of reads over the complete length of the transcripts (unless you used oligodT primers for reverse transcription) so you want no sharp drops or peaks. Low representation at the ends of the transcripts typically points to mRNA degradation.

read distribution

The tool calculates the fraction of reads mapped to coding exons, 5'-UTR exons, 3'-UTR exons, introns and intergenic regions based on the gene models provided. This module roughly reflects the uniformity of coverage; for example, reads are generally over-represented in 3'-UTR for the polyA + RNA-seq protocol.

read duplication

Two strategies were used to determine reads duplication rate:

  • Sequence based: reads with identical sequence are regarded as duplicates.
  • Mapping based: reads mapped to the same genomic location are regarded as duplicates. For spliced reads, reads mapped to the same starting position and spliced the same way are regarded as duplicates.


junction saturation

This tool determines if the current sequencing depth is sufficient to perform alternative splicing analyses.

All (annotated) splice junctions should be rediscovered from saturated RNA-Seq data (when coverage is high enough, you should have reads that represent all splice junctions). This module checks for saturation by subsampling 5%, 10%, 15%, ..., 95% of the total alignments in the bam file and counting the number of splice junctions in each subsample. You can change the number and sizes of these subsamples by changing the values of percentile lower bound, percentile upper bound and percentile step size.

The more reads you use the more splice junctions you should find, both known and novel. Your data is considered saturated when you see no clear increase in splice junctions even if you increase the number of reads. In other words at the right end of the plot you want to see a horizontal plateau for the splice junctions.


junction annotation

This tool separates all detected splice junctions into ‘known’, ‘complete novel’ and ‘partial novel’ by comparing them with the reference gene model.