Varscan2
Why Use VarScan? Most of the published variant callers for next-generation sequencing data employ a probabilistic framework, such as Bayesian statistics, to detect variants and assess confidence in them. These approaches generally work quite well, but can be confounded by numerous factors such as extreme read depth, pooled samples, and contaminated or impure samples. In contrast, VarScan employs a robust heuristic/statistic approach to call variants that meet desired thresholds for read depth, base quality, variant allele frequency, and statistical significance.
Contents
varscan2: a non probabilistic variant caller
Varscan2[1][2] is coded in Java, and should be executed from the command line (Terminal, in Linux/UNIX/OSX, or Command Prompt in MS Windows). For variant calling, you will need a pileup file. See the How to Build A Pileup File section for details. Running VarScan with no arguments prints the usage information.
For a detailed documentation of v2.3 and later, see http://varscan.sourceforge.net/using-varscan.html[3] and for the current version, refer to GitHub at https://github.com/dkoboldt/varscan [4]
Define a system variable pointing to where VarScan.jar is located and name it VARSCAN. This will ease calling it from anywhere.
basic usage to call variants from samtools pileup
Varscan takes a samtools pileup (as well as the more recent mpileup) as input. Such data is easily obtained with a command below. Some functions expect tumor and normal paired files to perform pairwise analysis/filtering.
samtools mpileup -f [reference sequence] [BAM file(s)] >myData.mpileup
The samtools mpileup can be piped directly to varscan2 to save IO and disk space.
samtools mpileup -f [reference sequence] [BAM file(s)] | java -jar $VARSCAN/VarScan.jar pileup2snp ...
The most useful varscan2 functions are presented below, others can be reviewed by adding -h after the command line.
***NON-COMMERCIAL VERSION***
USAGE: java -jar $VARSCAN/VarScan.jar [COMMAND] [OPTIONS]
COMMANDS:
pileup2snp Identify SNPs from a pileup file
pileup2indel Identify indels a pileup file
pileup2cns Call consensus and variants from a pileup file
mpileup2snp Identify SNPs from an mpileup file
mpileup2indel Identify indels an mpileup file
mpileup2cns Call consensus and variants from an mpileup file
somatic Call germline/somatic variants from tumor-normal pileups
copynumber Determine relative tumor copy number from tumor-normal pileups
readcounts Obtain read counts for a list of variants from a pileup file
filter Filter SNPs by coverage, frequency, p-value, etc.
somaticFilter Filter somatic variants for clusters/indels
fpfilter Apply the false-positive filter
processSomatic Isolate Germline/LOH/Somatic calls from output
copyCaller GC-adjust and process copy number changes from VarScan copynumber output
compare Compare two lists of positions/variants
limit Restrict pileup/snps/indels to ROI positions
calling SNVs
For simple SNP calls, several options allow setting the stringency of the varscan2 prediction.
mpileup file - The SAMtools mpileup file
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom Minimum frequency to call homozygote [0.75]
--p-value Default p-value threshold for calling variants [99e-02]
--strand-filter Ignore variants with >90% support on one strand [1]
--output-vcf If set to 1, outputs in VCF format
--variants Report only variant (SNP/indel) positions (mpileup2cns only) [0]
calling small InDels
For indels, the following command will do
mpileup file - The SAMtools mpileup file
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom Minimum frequency to call homozygote [0.75]
--p-value Default p-value threshold for calling variants [99e-02]
--strand-filter Ignore variants with >90% support on one strand [1]
--output-vcf If set to 1, outputs in VCF format
--variants Report only variant (SNP/indel) positions (mpileup2cns only) [0]
calling both together
For SNV and indels, the following command will do
mpileup file - The SAMtools mpileup file
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom Minimum frequency to call homozygote [0.75]
--p-value Default p-value threshold for calling variants [99e-02]
--strand-filter Ignore variants with >90% support on one strand [1]
--output-vcf If set to 1, outputs in VCF format
--vcf-sample-list For VCF output, a list of sample names in order, one per line
--variants Report only variant (SNP/indel) positions [0]
Without --variants, the returned calls will be of ( ref / SNV / Indel ) while adding --variants will omit the ref-calls
filtering results
This filter command filters variants in a file by coverage, supporting reads, variant frequency, or average base quality. It is for use with output from mpileup2snp or mpileup2indel.
variants file - A file of SNP or indel calls from VarScan pileup2snp or pileup2indel
OPTIONS:
--min-coverage Minimum read depth at a position to make a call [10]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-strands2 Minimum # of strands on which variant observed (1 or 2) [1]
--min-avg-qual Minimum average base quality for variant-supporting reads [20]
--min-var-freq Minimum variant allele frequency threshold [0.20]
--p-value Default p-value threshold for calling variants [1e-01]
--indel-file File of indels for filtering nearby SNPs, from pileup2indel command
--output-file File to contain variants passing filters
more commands for tumor / normal sample pairs
Additional commands are available for somatic calls and somatic CNVs. Please refer to the varscan2 Wiki for detailed somatic detection information and examples.
References:
- ↑
Daniel C Koboldt, Qunyuan Zhang, David E Larson, Dong Shen, Michael D McLellan, Ling Lin, Christopher A Miller, Elaine R Mardis, Li Ding, Richard K Wilson
VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.
Genome Res: 2012, 22(3);568-76
[PubMed:22300766] ##WORLDCAT## [DOI] (I p)Daniel C Koboldt, Ken Chen, Todd Wylie, David E Larson, Michael D McLellan, Elaine R Mardis, George M Weinstock, Richard K Wilson, Li Ding
VarScan: variant detection in massively parallel sequencing of individual and pooled samples.
Bioinformatics: 2009, 25(17);2283-5
[PubMed:19542151] ##WORLDCAT## [DOI] (I p) - ↑ http://varscan.sourceforge.net
- ↑ http://varscan.sourceforge.net/using-varscan.html
- ↑ https://github.com/dkoboldt/varscan