Bwa

From BITS wiki
Jump to: navigation, search

BWA, developped by Heng Li and Richard Durbin ([1], [2]), has become of the most used NGS mapper. BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

SimilarTo.png: Bowtie, Bowtie2


[ BioWare | Main_Page ]


The main BWA page can be accessed at http://bio-bwa.sourceforge.net[3]. The manual page is at http://bio-bwa.sourceforge.net/bwa.shtml[4].

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp.

Technical.png BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads

a list of BWA commands

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.5a-r405
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'. There are
      three alignment algorithms in BWA: `mem', `bwasw' and `aln/samse/sampe'. If
      you are not sure which to use, try `bwa mem' first. Please `man ./bwa.1' for
      for the manual.

Building the hg19 BWA index

The hg19 genome reference index was built on a training linux Mint 17.3 virtual machine with 6GB RAM. The process was quite long abut resulted in the expected files.

bwa index HiSeq_UCSC_hg19.fa
[bwa_index] Pack FASTA... 32.70 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=6191387966, availableWord=447648912
[BWTIncConstructFromPacked] 10 iterations done. 99999998 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999998 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 299999998 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 399999998 characters processed.
[BWTIncConstructFromPacked] 50 iterations done. 499999998 characters processed.
[BWTIncConstructFromPacked] 60 iterations done. 599999998 characters processed.
[BWTIncConstructFromPacked] 70 iterations done. 699999998 characters processed.
[BWTIncConstructFromPacked] 80 iterations done. 799999998 characters processed.
[BWTIncConstructFromPacked] 90 iterations done. 899999998 characters processed.
[BWTIncConstructFromPacked] 100 iterations done. 999999998 characters processed.
[BWTIncConstructFromPacked] 110 iterations done. 1099999998 characters processed.
[BWTIncConstructFromPacked] 120 iterations done. 1199999998 characters processed.
[BWTIncConstructFromPacked] 130 iterations done. 1299999998 characters processed.
[BWTIncConstructFromPacked] 140 iterations done. 1399999998 characters processed.
[BWTIncConstructFromPacked] 150 iterations done. 1499999998 characters processed.
[BWTIncConstructFromPacked] 160 iterations done. 1599999998 characters processed.
[BWTIncConstructFromPacked] 170 iterations done. 1699999998 characters processed.
[BWTIncConstructFromPacked] 180 iterations done. 1799999998 characters processed.
[BWTIncConstructFromPacked] 190 iterations done. 1899999998 characters processed.
[BWTIncConstructFromPacked] 200 iterations done. 1999999998 characters processed.
[BWTIncConstructFromPacked] 210 iterations done. 2099999998 characters processed.
[BWTIncConstructFromPacked] 220 iterations done. 2199999998 characters processed.
[BWTIncConstructFromPacked] 230 iterations done. 2299999998 characters processed.
[BWTIncConstructFromPacked] 240 iterations done. 2399999998 characters processed.
[BWTIncConstructFromPacked] 250 iterations done. 2499999998 characters processed.
[BWTIncConstructFromPacked] 260 iterations done. 2599999998 characters processed.
[BWTIncConstructFromPacked] 270 iterations done. 2699999998 characters processed.
[BWTIncConstructFromPacked] 280 iterations done. 2799999998 characters processed.
[BWTIncConstructFromPacked] 290 iterations done. 2899999998 characters processed.
[BWTIncConstructFromPacked] 300 iterations done. 2999999998 characters processed.
[BWTIncConstructFromPacked] 310 iterations done. 3099999998 characters processed.
[BWTIncConstructFromPacked] 320 iterations done. 3199999998 characters processed.
[BWTIncConstructFromPacked] 330 iterations done. 3299999998 characters processed.
[BWTIncConstructFromPacked] 340 iterations done. 3399999998 characters processed.
[BWTIncConstructFromPacked] 350 iterations done. 3499999998 characters processed.
[BWTIncConstructFromPacked] 360 iterations done. 3599999998 characters processed.
[BWTIncConstructFromPacked] 370 iterations done. 3699999998 characters processed.
[BWTIncConstructFromPacked] 380 iterations done. 3799999998 characters processed.
[BWTIncConstructFromPacked] 390 iterations done. 3899999998 characters processed.
[BWTIncConstructFromPacked] 400 iterations done. 3999999998 characters processed.
[BWTIncConstructFromPacked] 410 iterations done. 4099999998 characters processed.
[BWTIncConstructFromPacked] 420 iterations done. 4199999998 characters processed.
[BWTIncConstructFromPacked] 430 iterations done. 4299999998 characters processed.
[BWTIncConstructFromPacked] 440 iterations done. 4399999998 characters processed.
[BWTIncConstructFromPacked] 450 iterations done. 4499999998 characters processed.
[BWTIncConstructFromPacked] 460 iterations done. 4599999998 characters processed.
[BWTIncConstructFromPacked] 470 iterations done. 4699999998 characters processed.
[BWTIncConstructFromPacked] 480 iterations done. 4799999998 characters processed.
[BWTIncConstructFromPacked] 490 iterations done. 4899999998 characters processed.
[BWTIncConstructFromPacked] 500 iterations done. 4999999998 characters processed.
[BWTIncConstructFromPacked] 510 iterations done. 5099999998 characters processed.
[BWTIncConstructFromPacked] 520 iterations done. 5199999998 characters processed.
[BWTIncConstructFromPacked] 530 iterations done. 5299999998 characters processed.
[BWTIncConstructFromPacked] 540 iterations done. 5399999998 characters processed.
[BWTIncConstructFromPacked] 550 iterations done. 5499999998 characters processed.
[BWTIncConstructFromPacked] 560 iterations done. 5596003406 characters processed.
[BWTIncConstructFromPacked] 570 iterations done. 5681458862 characters processed.
[BWTIncConstructFromPacked] 580 iterations done. 5757407918 characters processed.
[BWTIncConstructFromPacked] 590 iterations done. 5824907614 characters processed.
[BWTIncConstructFromPacked] 600 iterations done. 5884897502 characters processed.
[BWTIncConstructFromPacked] 610 iterations done. 5938212670 characters processed.
[BWTIncConstructFromPacked] 620 iterations done. 5985595326 characters processed.
[BWTIncConstructFromPacked] 630 iterations done. 6027705102 characters processed.
[BWTIncConstructFromPacked] 640 iterations done. 6065128334 characters processed.
[BWTIncConstructFromPacked] 650 iterations done. 6098386190 characters processed.
[BWTIncConstructFromPacked] 660 iterations done. 6127941854 characters processed.
[BWTIncConstructFromPacked] 670 iterations done. 6154206942 characters processed.
[BWTIncConstructFromPacked] 680 iterations done. 6177547390 characters processed.
[bwt_gen] Finished constructing BWT in 687 iterations.
[bwa_index] 2807.62 seconds elapse.
[bwa_index] Update BWT... 17.41 sec
[bwa_index] Pack forward-only FASTA... 15.72 sec
[bwa_index] Construct SA from BWT and Occ... 1580.61 sec
[main] Version: 0.7.13-r1126
[main] CMD: bwa index HiSeq_UCSC_hg19.fa
[main] Real time: 5212.117 sec; CPU: 4454.072 sec

############
# resulting files
############

-rwxr-x--- 1 bits bits 3,0G May 30 13:51 HiSeq_UCSC_hg19.fa
-rw-r--r-- 1 bits bits 6,6K Jun  1 17:10 HiSeq_UCSC_hg19.fa.amb
-rw-r--r-- 1 bits bits  939 Jun  1 17:10 HiSeq_UCSC_hg19.fa.ann
-rw-r--r-- 1 bits bits 2,9G Jun  1 17:09 HiSeq_UCSC_hg19.fa.bwt
-rw-r--r-- 1 bits bits 739M Jun  1 17:10 HiSeq_UCSC_hg19.fa.pac
-rw-r--r-- 1 bits bits 1,5G Jun  1 17:38 HiSeq_UCSC_hg19.fa.sa

References:
  1. Heng Li, Richard Durbin
    Fast and accurate long-read alignment with Burrows-Wheeler transform.
    Bioinformatics: 2010, 26(5);589-95
    [PubMed:20080505] ##WORLDCAT## [DOI] (I p)

  2. Heng Li, Richard Durbin
    Fast and accurate short read alignment with Burrows-Wheeler transform.
    Bioinformatics: 2009, 25(14);1754-60
    [PubMed:19451168] ##WORLDCAT## [DOI] (I p)

  3. http://bio-bwa.sourceforge.net
  4. http://bio-bwa.sourceforge.net/bwa.shtml



[ BioWare | Main_Page ]