From BITS wiki
Jump to: navigation, search

[ Overviews | Main_Page ]

This page lists 'mappers', also called 'aligners', of next-generation sequencing data. These programs map/align NGS data to a reference genome, which usually needs to be prepared or indexed prior by that mapper. As reference genome, the reference of the species is preferably taken, but also other species' reference genomes can be used to map to, although with much lower quality results.

Many mappers exist and are being developed, due to the specific computational problem: hundred thousands to millions of relatively short reads (30 to 500 bp) need to be mapped to the reference genome, which is 7 to 8 times an order of magnitude larger. This task needs to be performed with taking into account variations of the reads: insertions, deletions, mutations, orientation (in case of paired-end reads), splice-junctions (in case of mapping RNAseq data),...

Handicon.png A selection of our favorite mappers is presented in the DNAseq_toolbox

List of mappers

Tool NGS pipeline part Specifications RPM? Link to source
agile Aligner No
CloudBurst Aligner Cluster-enabled No
ContextMap Aligner rna-seq No
CUSHAW2 Aligner No
BarraCUDa Aligner No
bfast Aligner Yes - [1]
Bowtie Aligner U.D. [2]
Bowtie2 Aligner gapped No
BWA Aligner YES – [3]
BWA-MEM Aligner Long reads No
drFast Aligner solid No
FR-HIT Aligner No
GASSST Aligner No
Genomemapper Aligner gapped No
gmap/gsnap Aligner Gapped No
gnumap Aligner No
hobbes Aligner * install failed - Joa * No
MapSplice Aligner Gapped No
Maq Aligner YES – [4]
MIRA Aligner If chosen 'mapping' mode No
MOM Aligner No
Mosaik-aligner Aligner No
NovoAlign Aligner No
mrFast/mrsFast Aligner No
OSA Aligner No
PASS Aligner No
PalMapper Aligner No
PerM Aligner No
PRISM Aligner No
RaserS Aligner No
RMAP Aligner bisulphite No
RNASEQR Aligner RNA-seq No
RSEM Aligner RNA-seq No
RUM Aligner rna-seq Gapped
SSAHA2 Aligner No
Shrimp Aligner No
SeqAlto Aligner No
SMALT Aligner No
SNAP Aligner No
SOAP Aligner No
SOCS Aligner solid No
SpliceMap Aligner rna-seq No
SRmapper Aligner no
Stampy Aligner No
STAR Aligner Gapped No
TopHat Aligner Gapped No
YAHA Aligner Long reads No
Zoom Aligner No

See also

Benchmarking mappers

Benchmark tests can asses computational requirements, as well as biological validity of the mapping results. Below we focus on the accuracy of the mappers, which is extremely important in some secondary analyses such as variant calling.


Simulating reads

To benchmark mappers, these methods start from a reference genome of interest for simulating a sequencing run, resulting in simulated reads. Based on the performance of the mapper on these reads against the reference, accuracy statistics are collected. The model generating the reads has to account for many known and yet unknown biases and errors influencing read count and quality, depending on the mimicked platform and the genome generating the reads from.

See Read simulation page for read simulators.

Simulating a reference genome

ARDEN ([5]) does not simulate reads, but alters the reference genome to an artificial genome, following some rules, such that none of the reads aligns perfectly any more to the reference. ARDEN compares the mapping to the reference genome with the mapping to this artificial genome, from which it calculates sensitivity and specificity.

Analysis of real reads that align imperfectly

CLC Bio has used a different approach to benchmark mappers ([6]). The rationale is this: the best mapping we can perform is the Smith-Waterman (SW) algorithm. Secondly, reads mapping perfectly do not provide information on accuracy. Hence, they have analysed a subset of real reads which do not align perfectly, aligned them with SW and with the mapper of interest, and compared the results. Typically, many reads were mapped optimally, a large subset were mapped suboptimally, and some reads were unmapped. The results are displayed in a scatter plot, comparing the optimal score of an alignment (by SW) with the score obtained by the heuristic mapper.


Benchmark initiatives