Varscan2

From BITS wiki
Jump to: navigation, search


VarScan-FancyLogo.jpg

Why Use VarScan? Most of the published variant callers for next-generation sequencing data employ a probabilistic framework, such as Bayesian statistics, to detect variants and assess confidence in them. These approaches generally work quite well, but can be confounded by numerous factors such as extreme read depth, pooled samples, and contaminated or impure samples. In contrast, VarScan employs a robust heuristic/statistic approach to call variants that meet desired thresholds for read depth, base quality, variant allele frequency, and statistical significance.


[ BioWare | Main_Page ]


varscan2: a non probabilistic variant caller

Varscan2[1][2] is coded in Java, and should be executed from the command line (Terminal, in Linux/UNIX/OSX, or Command Prompt in MS Windows). For variant calling, you will need a pileup file. See the How to Build A Pileup File section for details. Running VarScan with no arguments prints the usage information.

For a detailed documentation of v2.3 and later, see http://varscan.sourceforge.net/using-varscan.html[3] and for the current version, refer to GitHub at https://github.com/dkoboldt/varscan [4]

Handicon.png Define a system variable pointing to where VarScan.jar is located and name it VARSCAN. This will ease calling it from anywhere.

basic usage to call variants from samtools pileup

Varscan takes a samtools pileup (as well as the more recent mpileup) as input. Such data is easily obtained with a command below. Some functions expect tumor and normal paired files to perform pairwise analysis/filtering.

samtools mpileup -f [reference sequence] [BAM file(s)] >myData.mpileup

The samtools mpileup can be piped directly to varscan2 to save IO and disk space.

samtools mpileup -f [reference sequence] [BAM file(s)] | java -jar $VARSCAN/VarScan.jar pileup2snp ...

The most useful varscan2 functions are presented below, others can be reviewed by adding -h after the command line.

VarScan v2.4

***NON-COMMERCIAL VERSION***

USAGE: java -jar $VARSCAN/VarScan.jar [COMMAND] [OPTIONS]

COMMANDS:
        pileup2snp              Identify SNPs from a pileup file
        pileup2indel            Identify indels a pileup file
        pileup2cns              Call consensus and variants from a pileup file
        mpileup2snp             Identify SNPs from an mpileup file
        mpileup2indel           Identify indels an mpileup file
        mpileup2cns             Call consensus and variants from an mpileup file

        somatic                 Call germline/somatic variants from tumor-normal pileups
        copynumber                      Determine relative tumor copy number from tumor-normal pileups
        readcounts              Obtain read counts for a list of variants from a pileup file

        filter                  Filter SNPs by coverage, frequency, p-value, etc.
        somaticFilter           Filter somatic variants for clusters/indels
        fpfilter                Apply the false-positive filter

        processSomatic          Isolate Germline/LOH/Somatic calls from output
        copyCaller              GC-adjust and process copy number changes from VarScan copynumber output
        compare                 Compare two lists of positions/variants
        limit                   Restrict pileup/snps/indels to ROI positions

calling SNVs

For simple SNP calls, several options allow setting the stringency of the varscan2 prediction.

USAGE: java -jar $VARSCAN/VarScan.jar mpileup2snp [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]

calling small InDels

For indels, the following command will do

USAGE: java -jar $VARSCAN/VarScan.jar mpileup2indel [mpileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --variants      Report only variant (SNP/indel) positions (mpileup2cns only) [0]

calling both together

For SNV and indels, the following command will do

USAGE: java -jar $VARSCAN/VarScan.jar mpileup2cns [pileup file] OPTIONS
        mpileup file - The SAMtools mpileup file

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [8]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-avg-qual  Minimum base quality at a position to count a read [15]
        --min-var-freq  Minimum variant allele frequency threshold [0.01]
        --min-freq-for-hom      Minimum frequency to call homozygote [0.75]
        --p-value       Default p-value threshold for calling variants [99e-02]
        --strand-filter Ignore variants with >90% support on one strand [1]
        --output-vcf    If set to 1, outputs in VCF format
        --vcf-sample-list       For VCF output, a list of sample names in order, one per line
        --variants      Report only variant (SNP/indel) positions [0]

Technical.png Without --variants, the returned calls will be of ( ref / SNV / Indel ) while adding --variants will omit the ref-calls

filtering results

This filter command filters variants in a file by coverage, supporting reads, variant frequency, or average base quality. It is for use with output from mpileup2snp or mpileup2indel.

USAGE: java -jar $VARSCAN/VarScan.jar filter [variants file] OPTIONS
        variants file - A file of SNP or indel calls from VarScan pileup2snp or pileup2indel

        OPTIONS:
        --min-coverage  Minimum read depth at a position to make a call [10]
        --min-reads2    Minimum supporting reads at a position to call variants [2]
        --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
        --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
        --min-var-freq  Minimum variant allele frequency threshold [0.20]
        --p-value       Default p-value threshold for calling variants [1e-01]
        --indel-file    File of indels for filtering nearby SNPs, from pileup2indel command
        --output-file   File to contain variants passing filters

more commands for tumor / normal sample pairs

Additional commands are available for somatic calls and somatic CNVs. Please refer to the varscan2 Wiki for detailed somatic detection information and examples.


References:
  1. Daniel C Koboldt, Qunyuan Zhang, David E Larson, Dong Shen, Michael D McLellan, Ling Lin, Christopher A Miller, Elaine R Mardis, Li Ding, Richard K Wilson
    VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.
    Genome Res.: 2012, 22(3);568-76
    [PubMed:22300766] ##WORLDCAT## [DOI] (I p)

    Daniel C Koboldt, Ken Chen, Todd Wylie, David E Larson, Michael D McLellan, Elaine R Mardis, George M Weinstock, Richard K Wilson, Li Ding
    VarScan: variant detection in massively parallel sequencing of individual and pooled samples.
    Bioinformatics: 2009, 25(17);2283-5
    [PubMed:19542151] ##WORLDCAT## [DOI] (I p)

  2. http://varscan.sourceforge.net
  3. http://varscan.sourceforge.net/using-varscan.html
  4. https://github.com/dkoboldt/varscan



[ BioWare | Main_Page ]