Mappers

From BITS wiki
Jump to: navigation, search

[ Overviews | Main_Page ]



This page lists 'mappers', also called 'aligners', of next-generation sequencing data. These programs map/align NGS data to a reference genome, which usually needs to be prepared or indexed prior by that mapper. As reference genome, the reference of the species is preferably taken, but also other species' reference genomes can be used to map to, although with much lower quality results.

Many mappers exist and are being developed, due to the specific computational problem: hundred thousands to millions of relatively short reads (30 to 500 bp) need to be mapped to the reference genome, which is 7 to 8 times an order of magnitude larger. This task needs to be performed with taking into account variations of the reads: insertions, deletions, mutations, orientation (in case of paired-end reads), splice-junctions (in case of mapping RNAseq data),...

Handicon.png A selection of our favorite mappers is presented in the DNAseq_toolbox

List of mappers

Tool NGS pipeline part Specifications RPM? Link to source
agile Aligner No http://users.eecs.northwestern.edu/~smi539/agile.html
CloudBurst Aligner Cluster-enabled No http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.php?title=CloudBurst
ContextMap Aligner rna-seq No http://www.bio.ifi.lmu.de/softwareservices/contextmap
CUSHAW2 Aligner No http://cushaw2.sourceforge.net
BarraCUDa Aligner No http://seqbarracuda.sf.net
bfast Aligner Yes - [1] http://sourceforge.net/apps/mediawiki/bfast/index.php?title=Main_Page
Bowtie Aligner U.D. [2] http://bowtie-bio.sourceforge.net/index.shtml
Bowtie2 Aligner gapped No http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
BWA Aligner YES – [3] http://bio-bwa.sourceforge.net/
BWA-MEM Aligner Long reads No https://github.com/lh3/bwa
drFast Aligner solid No http://drfast.sourceforge.net/
FR-HIT Aligner No http://weizhong-lab.ucsd.edu/frhit/
GASSST Aligner No http://www.irisa.fr/symbiose/projects/gassst/
Genomemapper Aligner gapped No http://www.1001genomes.org/software/genomemapper.html
gmap/gsnap Aligner Gapped No http://research-pub.gene.com/gmap/
gnumap Aligner No http://dna.cs.byu.edu/gnumap/
hobbes Aligner * install failed - Joa * No http://hobbes.ics.uci.edu/
MapSplice Aligner Gapped No http://www.netlab.uky.edu/p/bioinfo/MapSplice
Maq Aligner YES – [4] http://maq.sourceforge.net
MIRA Aligner If chosen 'mapping' mode No http://sourceforge.net/projects/mira-assembler/
MOM Aligner No http://mom.csbc.vcu.edu/
Mosaik-aligner Aligner No http://code.google.com/p/mosaik-aligner/
NovoAlign Aligner No http://www.novocraft.com/main/page.php?id=968
mrFast/mrsFast Aligner No http://mrfast.sourceforge.net/
OSA Aligner No http://www.omicsoft.com/osa
PASS Aligner No http://pass.cribi.unipd.it/cgi-bin/pass.pl?action=Download
PalMapper Aligner No http://www.raetschlab.org/suppl/palmapper
PerM Aligner No http://code.google.com/p/perm/
PRISM Aligner No http://compbio.cs.toronto.edu/prism/
RaserS Aligner No http://www.seqan.de/projects/razers/
RMAP Aligner bisulphite No http://rulai.cshl.edu/rmap/
RNASEQR Aligner RNA-seq No http://hood.systemsbiology.net/rnaseqr.php
RSEM Aligner RNA-seq No http://deweylab.biostat.wisc.edu/rsem/
RUM Aligner rna-seq Gapped https://github.com/PGFI/rum/downloads
SSAHA2 Aligner No http://www.sanger.ac.uk/resources/software/ssaha2/
Shrimp Aligner No http://compbio.cs.toronto.edu/shrimp/
SeqAlto Aligner No http://www.stanford.edu/group/wonglab/seqalto/
SMALT Aligner No http://www.sanger.ac.uk/resources/software/smalt/
SNAP Aligner No http://snap.cs.berkeley.edu/
SOAP Aligner No http://soap.genomics.org.cn/soapaligner.html
SOCS Aligner solid No http://solidsoftwaretools.com/gf/project/socs/
SpliceMap Aligner rna-seq No http://www.stanford.edu/group/wonglab/SpliceMap/
SRmapper Aligner no http://www.umsl.edu/~wongch/software.html
Stampy Aligner No http://www.well.ox.ac.uk/project-stampy
STAR Aligner Gapped No https://github.com/alexdobin/STAR/
TopHat Aligner Gapped No http://tophat.cbcb.umd.edu/
YAHA Aligner Long reads No http://faculty.virginia.edu/irahall/YAHA
Zoom Aligner No http://www.bioinformaticssolutions.com/all-products/zoom/index.php

See also

Benchmarking mappers

Benchmark tests can asses computational requirements, as well as biological validity of the mapping results. Below we focus on the accuracy of the mappers, which is extremely important in some secondary analyses such as variant calling.

Methods

Simulating reads

To benchmark mappers, these methods start from a reference genome of interest for simulating a sequencing run, resulting in simulated reads. Based on the performance of the mapper on these reads against the reference, accuracy statistics are collected. The model generating the reads has to account for many known and yet unknown biases and errors influencing read count and quality, depending on the mimicked platform and the genome generating the reads from.

See Read simulation page for read simulators.

Simulating a reference genome

ARDEN ([5]) does not simulate reads, but alters the reference genome to an artificial genome, following some rules, such that none of the reads aligns perfectly any more to the reference. ARDEN compares the mapping to the reference genome with the mapping to this artificial genome, from which it calculates sensitivity and specificity.

Analysis of real reads that align imperfectly

CLC Bio has used a different approach to benchmark mappers ([6]). The rationale is this: the best mapping we can perform is the Smith-Waterman (SW) algorithm. Secondly, reads mapping perfectly do not provide information on accuracy. Hence, they have analysed a subset of real reads which do not align perfectly, aligned them with SW and with the mapper of interest, and compared the results. Typically, many reads were mapped optimally, a large subset were mapped suboptimally, and some reads were unmapped. The results are displayed in a scatter plot, comparing the optimal score of an alignment (by SW) with the score obtained by the heuristic mapper.

CLCbenchmarkingEffort.png

Benchmark initiatives