Bfast

From BITS wiki
Jump to: navigation, search

A faster and customizable hash-based aligner for short sequences

SimilarTo.png, Blat


[ BioWare | Main_Page ]


BFAST: an alignment tool for large scale genome resequencing (2009)[1](Download from Sourceforge [2])

BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:

  • Speed: enables billions of short reads to be mapped quickly.
  • Accuracy: A priori probabilities for mapping reads with defined set of variants.
  • An easy way to measurably tune accuracy at the expense of speed.

Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance. BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.

How the Bfast index works

The main limitation of Bfast, linked to its intrinsic power, is its voracious RAM need. At present each human genome index requires ~17 GB of dynamic memory (1.5 GB for the genome, 14.5 GB for the primary index, and 1 GB for the hash index). BFAST can trade-off speed for accuracy by using four built-in modes: fast, moderate.speed, moderate.accuracy, and accurate.


Bfast_fig1.png


Bfast_tab2.png

Reformat reference

A reference genome in fasta format has to be converted into *.brg format (a binary format) using the 'bfast fasta2brg' command. -A indicates which space you want to use: -A 0 for nucleotide space, -A 1 for color space. The -f points to the reference genome.

bfast fasta2brg -f build37.fa -A 0

Create index

Next, both (links to) the original fasta file and the newly created binary reference genome needs to be in the current directory, in order to create the index. -A and the -f (see above) as well as the -m option option are required. The -m indicates the hash size (14 is default recommended).

The sets of seeds (determined experimentally - see paper) come in two sets (based on supplemental material of bfast): one set optimized for aligning reads shorter then 40 nt (first script below), a second set for reads longer than 40 nt (second script below). Execute one of following scripts in the directory with the links to the .fasta and the .brg file.


References:
  1. Nils Homer, Barry Merriman, Stanley F Nelson
    BFAST: an alignment tool for large scale genome resequencing.
    PLoS One: 2009, 4(11);e7767
    [PubMed:19907642] ##WORLDCAT## [DOI] (I e)

  2. http://sourceforge.net/apps/mediawiki/bfast/index.php?title=Main_Page



[ Main_Page ]