Samtools
The canonical BAM/SAM/CRAM data framework
suggests : bcftools
Samtools[1][2] is the standard library to work on SAM, BAM (and in the future CRAM) files. A Samtools Java wrapper is available under the name Picard.
Contents
Samtools's versions
Samtools and its companion bcftools are in constant evolution as well as related apps like vcftools. This is due to the increasing need for speed and complex analysis triggered by the ever growing NGS community.
Consequently, two versions of Samtools are currently coexisting; the current version 1.x using the htslib engine which should be more performant and the older and classical version 0.x which is still used by many packages (tophat).
NEW official page for Samtools and relatives
- Samtools htslib (htslib version) http://www.htslib.org/ http://www.htslib.org/
- Explain SAM/BAM flags page: https://broadinstitute.github.io/picard/explain-flags.html
GitHub pages with valid information and dev-versions
- manual page for bcftools commands http://samtools.github.io/bcftools/bcftools.html [3]
- Bcftools GitHub website: http://github.com/samtools/bcftools
- Samtools GitHub website: http://github.com/samtools/samtools
- HTSlib GitHub website: http://github.com/samtools/htslib
- File format specifications: http://samtools.github.io/hts-specs
- Bcftools documentation: http://samtools.github.io/bcftools
- Bcftools wiki page: https://github.com/samtools/bcftools/wiki
The new htslib version of Samtools 1.3.x
The current version 1.3.x of samtools was re-written to use the htslib engine and is faster and more adapted to high volume data. It also supports CRAM format and more http://www.htslib.org/[4].
Please consider migrating to samtools 1.3.x. Most commands remain unchanged except for here and there an additional argument or command detail. The transition should be painless if you can spend some time finding internet examples.
• samtools 1.3.1, the new (htslib) version of samtools
Version: 1.3.1 (using htslib 1.3.1)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA
index index alignment
-- Editing
calmd recalculate MD/NM tags and '=' bases
fixmate fix mate information
reheader replace BAM header
rmdup remove PCR duplicates
targetcut cut fosmid regions (for fosmid pool only)
addreplacerg adds or replaces RG tags
-- File operations
collate shuffle and group alignments by name
cat concatenate BAMs
merge merge sorted alignments
mpileup multi-way pileup
sort sort alignment file
split splits a file by read group
quickcheck quickly check if SAM/BAM/CRAM file appears intact
fastq converts a BAM to a FASTQ
fasta converts a BAM to a FASTA
-- Statistics
bedcov read depth per BED region
depth compute the depth
flagstat simple stats
idxstats BAM index stats
phase phase heterozygotes
stats generate stats (former bamcheck)
-- Viewing
flags explain BAM flags
tview text alignment viewer
view SAM<->BAM<->CRAM conversion
depad convert padded BAM to unpadded BAM
## the accompanying perl tools 'plot-bamstats' and 'plot-vcfstats' are present to create gnuplot pictures from the samtools output data.
# additional perl scripts are 'blast2sam.pl export2sam.pl psl2sam.pl seq_cache_populate.pl wgsim_eval.pl
# bowtie2sam.pl interpolate_sam.pl sam2vcf.pl soap2sam.pl zoom2sam.pl color-chrs.pl novo2sam.pl samtools.pl vcfutils.pl'
• bgzip 1.3.1, the new (htslib) version of bgzip
Usage: bgzip [OPTIONS] [FILE] ...
Options:
-b, --offset INT decompress at virtual file pointer (0-based uncompressed offset)
-c, --stdout write on standard output, keep original files unchanged
-d, --decompress decompress
-f, --force overwrite files without asking
-h, --help give this help
-i, --index compress and create BGZF index
-I, --index-name FILE name of BGZF index file [file.gz.gzi]
-r, --reindex (re)index compressed file
-s, --size INT decompress INT bytes (uncompressed size)
-@, --threads INT number of compression threads to use [1]
• tabix 1.3.1, the new (htslib) version of tabix
Usage: tabix [OPTIONS] [FILE] [REGION [...]]
Indexing Options:
-0, --zero-based coordinates are zero-based
-b, --begin INT column number for region start [4]
-c, --comment CHAR skip comment lines starting with CHAR [null]
-C, --csi generate CSI index for VCF (default is TBI)
-e, --end INT column number for region end (if no end, set INT to -b) [5]
-f, --force overwrite existing index without asking
-m, --min-shift INT set minimal interval size for CSI indices to 2^INT [14]
-p, --preset STR gff, bed, sam, vcf
-s, --sequence INT column number for sequence names (suppressed by -p) [1]
-S, --skip-lines INT skip first INT lines [0]
Querying and other options:
-h, --print-header print also the header lines
-H, --only-header print only the header lines
-l, --list-chroms list chromosome names
-r, --reheader FILE replace the header with the content of FILE
-R, --regions FILE restrict to regions listed in the file
-T, --targets FILE similar to -R but streams rather than index-jumps
The 'OLD version' of Samtools with embeded bcftools
samtools 0.1.19
commands in samtools version 0.1.19
Version: 0.1.19-44428cd
Usage: samtools <command> [options]
Command: view SAM<->BAM conversion
sort sort alignment file
mpileup multi-way pileup
depth compute the depth
faidx index/extract FASTA
tview text alignment viewer
index index alignment
idxstats BAM index stats (r595 or later)
fixmate fix mate information
flagstat simple stats
calmd recalculate MD/NM tags and '=' bases
merge merge sorted alignments
rmdup remove PCR duplicates
reheader replace BAM header
cat concatenate BAMs
bedcov read depth per BED region
targetcut cut fosmid regions (for fosmid pool only)
phase phase heterozygotes
bamshuf shuffle and group alignments by name
the samtools bcftools built-in command
Version: 0.1.19-44428cd
Usage: bcftools <command> <arguments>
Command: view print, extract, convert and call SNPs from BCF
index index BCF
cat concatenate BCFs
ld compute all-pair r^2
ldpair compute r^2 between requested pairs
example of cleaning, sorting, merging BAM files
I was personally helped a lot by Dave Tang's wiki page [5] reporting example commands for samtools:
eg. filter unmapped reads from 'bad.bam' to a new 'cleaned.bam', sort, and index
samtools sort tmp.bam cleaned
samtools index cleaned.bam && rm bad.bam tmp.bam
# merge multiple BAM files into one sorted file
# from http://www.biostars.org/p/9864/
find $BAM_DIR -name '*.bam' | {
read firstbam
samtools view -h "$firstbam"
while read bam; do
samtools view "$bam"
done
} | samtools view -ubS - | samtools sort - merged
samtools index merged.bam
ls -l merged.bam merged.bam.bai
example of calling variants with the 'stable' version
ref=your-reference-genome.fasta; # should be indexed with samtools fa2idx
bamfile = your-sorted-and-indexed.bam
# pileup reads, generate raw BCF data, and call variants (SNVs and indels)
## samtools mpileup parameters
## -u generate uncompress BCF output
## bcftools view parameters
## -b output BCF instead of VCF
## -v output potential variant sites only (force -c)
## -c SNP calling (force -e)
## -e likelihood based analyses
## -g call genotypes at variant sites (force -c)
### as seen above several options forces other options
samtools_0.1.19 mpileup -uf ${ref} \
${bamfile} | bcftools_0.1.19 view -bvceg - > rawcalls.bcf
# parse the raw BCF data and filter out very-high coverage calls
## such calls likely originate from CNV compression in the reference genome
## -D1000 filters out calls with more than 1000 piled-up reads
### a common practice is to adjust -D to 20-30x your average coverage depth
bcftools_0.1.19 view rawcalls.bcf | \
vcfutils_0.1.19.pl varFilter -D1000 > filtered-calls.vcf
# compress and index VCF calls
bgzip -c filtered-calls.vcf > filtered-calls.vcf.gz && \
tabix -p vcf filtered-calls.vcf.gz
References:
- ↑
Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, 1000 Genome Project Data Processing Subgroup
The Sequence Alignment/Map format and SAMtools.
Bioinformatics: 2009, 25(16);2078-9
[PubMed:19505943] ##WORLDCAT## [DOI] (I p) - ↑ http://samtools.sourceforge.net
- ↑ http://samtools.github.io/bcftools/bcftools.html
- ↑ http://www.htslib.org/
- ↑ http://davetang.org/wiki/tiki-index.php?page=SAMTools