VCFtools
The evolving standard framework for Variant analysis
: GATK
suggests : Bcftools
VCFtools[1] is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
VCFTools code and documentation are hosted at http://vcftools.sourceforge.net/[2]
This toolset can be used to perform the following operations on VCF files:
- Filter out specific variants
- Compare files
- Summarize variants
- Convert to different file types
- Validate and merge files
- Create intersections and subsets of variants
A mail list is present where you can register and post your questions and error reports and where you will be very rapidly rescued.
To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/vcftools-help [3]
VCFtools consists of two parts, a perl module and a binary executable. The perl module is a general Perl API for manipulating VCF files, whereas the binary executable provides general analysis routines.
vcftools
VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc. The Perl tools support all versions of the VCF specification (3.2, 3.3, 4.0, 4.1 and 4.2), nevertheless, the users are encouraged to use the latest versions VCFv4.1 or VCFv4.2. The VCFtools in general have been used mainly with diploid data, but the Perl tools aim to support polyploid data as well.
Run any of the Perl scripts with the --help switch to obtain more help.
Many of the Perl scripts require that the VCF files are compressed by bgzip and indexed by tabix (both tools are part of the tabix package, available for download here). The VCF files can be compressed and indexed using the following commands
Tool page: http://vcftools.sourceforge.net/index.html [4]
bgzip my_file.vcf tabix -p vcf my_file.vcf.gz # the tools fill-aa fill-an-ac fill-fs fill-ref-md5 fill-rsIDs vcf-annotate vcf-compare vcf-concat vcf-consensus vcf-contrast vcf-convert vcf-filter vcf-fix-newlines vcf-fix-ploidy vcf-indel-stats vcf-isec vcf-merge vcf-phased-join vcf-query vcf-shuffle-cols vcf-sort vcf-stats vcf-subset vcf-to-tab vcf-tstv vcf-validator # all are based on the perl module Vcf.pm
References:
- ↑
Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A Albers, Eric Banks, Mark A DePristo, Robert E Handsaker, Gerton Lunter, Gabor T Marth, Stephen T Sherry, Gilean McVean, Richard Durbin, 1000 Genomes Project Analysis Group
The variant call format and VCFtools.
Bioinformatics: 2011, 27(15);2156-8
[PubMed:21653522] ##WORLDCAT## [DOI] (I p) - ↑ http://vcftools.sourceforge.net/
- ↑ https://lists.sourceforge.net/lists/listinfo/vcftools-help
- ↑ http://vcftools.sourceforge.net/index.html