NGS-Var Exercise.6
[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2016 | NGS-Var Exercise.5 | NGS-Var Exercise.7 ]
Annotate and filter VCF variant lists with SnpEff and SNPSift
Contents
Choose the right tool to enrich your VCF data
A growing number of tools are available to annotate and select from VCF files. The choice of the best tool for your application depends on several factors.
- when you need the job done and do not worry about the flexibility, we advise to use SnpEff and the companion SnpSift which are both easy to use java programs.
- if you wish to add annotations from third-party databases that are not present in the other tools, or if you work on a organism absent from the above tool, you may consider using Annovar that was included in our former training session ([1]).
- when you only need to annotate a few VCF rows, you are welcome to use public servers like:
Submitting 'patentable' information to the WEB infringes the novelty clause and will expose patient information to the internet, and the size of input is limited to few 100's lines
- other tools have been used with success like vcfCodingSnps ([5])
Install and configure SnpEff
snpEff comes with default settings that need be edited and you also have to choose the relevant database for your genome of interest. Since we mapped using hg19, we should match this choice in snpEff and download the corresponding annotations. Please read more information about setting up this program on the official snpEff pages.
snpEff configuration
# download and install the software according to the instructions # http://snpeff.sourceforge.net/SnpEff_manual.html#install # edit the snpEff.config to point to the correct place for the data to be saved # default is './data' in the program folder # create a variable to point to the SnpEff installation folder in your .profile (or .bashrc) ## export SNPEFF=/path/to/snpeff ## export SNPEFFDB=/path/to/snpeff/data # identify the name of the reference data corresponding to your assembly java -jar $SNPEFF/snpEff.jar databases | grep "Homo_sapiens" GRCh37.70 Homo_sapiens http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_GRCh37.70.zip GRCh37.75 Homo_sapiens (OK) http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_GRCh37.75.zip GRCh37.GTEX Homo_sapiens,Gencode 12,GTEX project http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_GRCh37.GTEX.zip GRCh38.81 Homo_sapiens http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_GRCh38.81.zip GRCh38.82 Homo_sapiens http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_GRCh38.82.zip hg19 Homo_sapiens (UCSC OK) http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_hg19.zip hg19kg Homo_sapiens (UCSC KnownGenes) http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_hg19kg.zip hg38 Homo_sapiens (UCSC) http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_hg38.zip hg38kg Homo_sapiens (UCSC KnownGenes) http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_hg38kg.zip testHg19ChrM Homo_sapiens (UCSC) http://downloads.sourceforge.net/project/snpeff/databases/v4_2/snpEff_v4_2_testHg19ChrM.zip # download and install the hg19 database (DO NOT RUN - was done for you already) # java -jar $SNPEFF/snpEff.jar download -v hg19
Add annotations to VCF data with snpEff
SnpEff & SnpSift [1][2] were developed by Pablo Cingolani after vcfCodingSnps (Yanming Li, Goncalo Abecasis)[3] to directly annotate VCF data and filter calls by many different ways. Both programs combine the richness of Annovar[4] annotations and the advantage of manipulating the VCF data directly and without changing format. This session only provides a starter to snpEff. Please refer to the SnpEff manual pages[5] and SnpSift manual pages[6] for more information.
- Please read SnpEff usage in the full GATK GuideBook and how SnpEff annotations can be added to GATK VCF data using the GATK VariantAnnotator tool (regularly check the GATK pages for more recent versions of these documents).
snpEff full command list
Usage: snpEff [command] [options] [files] Run 'java -jar snpEff.jar command' for help on each specific command Available commands: [eff|ann] : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann'). build : Build a SnpEff database. buildNextProt : Build a SnpEff for NextProt (using NextProt's XML files). cds : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness. closest : Annotate the closest genomic region. count : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval. databases : Show currently available databases (from local config file). download : Download a SnpEff database. dump : Dump to STDOUT a SnpEff database (mostly used for debugging). genes2bed : Create a bed file from a genes list. len : Calculate total genomic length for each marker type. pdb : Build interaction database (based on PDB data). protein : Compare protein sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness. show : Show a text representation of genes or transcripts coordiantes, DNA sequence and protein sequence. Generic options: -c , -config : Specify config file -configOption name=value : Override a config file option -d , -debug : Debug mode (very verbose). -dataDir <path> : Override data_dir parameter from config file. -download : Download a SnpEff database, if not available locally. Default: true -nodownload : Do not download a SnpEff database, if not available locally. -h , -help : Show this help and exit -noLog : Do not report usage statistics to server -t : Use multiple threads (implies '-noStats'). Default 'off' -q , -quiet : Quiet mode (do not show any messages or errors) -v , -verbose : Verbose mode -version : Show version number and exit Database options: -canon : Only use canonical transcripts. -interaction : Annotate using inteactions (requires interaciton database). Default: true -interval <file> : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times) -maxTSL <TSL_number> : Only use transcripts having Transcript Support Level lower than <TSL_number>. -motif : Annotate using motifs (requires Motif database). Default: true -nextProt : Annotate using NextProt (requires NextProt database). -noGenome : Do not load any genomic database (e.g. annotate using custom files). -noInteraction : Disable inteaction annotations -noMotif : Disable motif annotations. -noNextProt : Disable NextProt annotations. -onlyReg : Only use regulation tracks. -onlyProtein : Only use protein coding transcripts. Default: false -onlyTr <file.txt> : Only use the transcripts in this file. Format: One transcript ID per line. -reg <name> : Regulation track to use (this option can be used add several times). -ss , -spliceSiteSize <int> : Set size for splice sites (donor and acceptor) in bases. Default: 2 -spliceRegionExonSize <int> : Set size for splice site region within exons. Default: 3 bases -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases -strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false -ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)
annotate the snfEff demo file with hg19
A test file is provided with the package, we use it here to annotate variants with the human hg19 version
# this example demo command annotated a demo file provided with the software # we use ##reference=hg19 outfolder=snpEff-test mkdir -p $BASE/${outfolder} # take a small sample to save time head -11 $SNPEFF/examples/test.1KG.vcf > $BASE/${outfolder}/test.1KG.vcf java -jar $SNPEFF/snpEff.jar hg19 \ $BASE/${outfolder}/test.1KG.vcf \ > $BASE/${outfolder}/test.1KG_hg19.vcf # inspect input cat $BASE/${outfolder}/test.1KG.vcf #CHROM POS ID REF ALT QUAL FILTER INFO 1 10291 . C T 2373.79 . AC=149 1 10303 . C T 294.20 . AC=32 1 10309 . C T 164.52 . AC=23 1 10315 . C T 394.78 . AC=47 1 10457 . A C 217.73 . AC=16 1 10469 rs117577454 C G 365.78 . AC=30 1 10492 rs55998931 C T 1309.47 . AC=72 1 10575 . C G 7.23 . AC=1 1 10583 rs58108140 G A 2817.71 . AC=154 1 10611 . C G 200.55 . AC=17 # the output became (until first record for the sake of space) head -7 $BASE/${outfolder}/test.1KG_hg19.vcf ##SnpEffVersion="4.2 (build 2015-12-05), by Pablo Cingolani" ##SnpEffCmd="SnpEff hg19 /home/bits/NGS/Variant/test.1KG.vcf " ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' "> ##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' "> ##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected' "> #CHROM POS ID REF ALT QUAL FILTER INFO 1 10291 . C T 2373.79 . AC=149;ANN=T|upstream_gene_variant|MODIFIER|DDX11L1|DDX11L1|transcript|NR_046018.2|pseudogene||n.-1583C>T|||||1583|,T|downstream_gene_variant|MODIFIER|WASH7P|WASH7P|transcript|NR_024540.1|pseudogene||n.*4071G>A|||||4071|,T|intergenic_region|MODIFIER|DDX11L1|DDX11L1|intergenic_region|DDX11L1|||n.10291C>T||||||
Scroll to the right in the results above and see why you do not want to read in VCF files as they come
same output with lines cut at 80 characters (with the nice cli GNU app 'fold')
head -7 $BASE/${outfolder}/test.1KG_hg19.vcf | fold -w 80 ##SnpEffVersion="4.2 (build 2015-12-05), by Pablo Cingolani" ##SnpEffCmd="SnpEff hg19 /home/bits/NGS/Variant/test.1KG.vcf " ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature _ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS .pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' "> ##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effe cts for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_ge ne | Percent_of_transcripts_affected' "> ##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated dec ay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcript s_in_gene | Percent_of_transcripts_affected' "> #CHROM POS ID REF ALT QUAL FILTER INFO 1 10291 . C T 2373.79 . AC=149;ANN=T|upstream_ge ne_variant|MODIFIER|DDX11L1|DDX11L1|transcript|NR_046018.2|pseudogene||n.-1583C> T|||||1583|,T|downstream_gene_variant|MODIFIER|WASH7P|WASH7P|transcript|NR_02454 0.1|pseudogene||n.*4071G>A|||||4071|,T|intergenic_region|MODIFIER|DDX11L1|DDX11L 1|intergenic_region|DDX11L1|||n.10291C>T||||||
Next to the annotated VCF, two more files are generated:
- a text file reporting genes with one transcript model per row and variant type counts (saved here)
- a HTML report with much more information that you can review and print when needed (saved here)
annotate the Varscan2 and bcftools calls with the content of the SnpEFF-hg19 database
Always match the annotation database to what was used for mapping or you risk to add annotations in random places when two reference build differ
##varscan2 calls invcf=varscan2_variants/chr21_NA18507_varscan.vcf.gz outfolder=VCF_annotation outvcf=chr21_NA18507_varscan-hg19.vcf build=hg19 mkdir -p ${outfolder} # annotate and save all results to $outfolder java -jar $SNPEFF/snpEff.jar \ -htmlStats ${outfolder}/varscan_snpEff_summary.html \ ${build} ${invcf} > $outfolder/${outvcf} # create .gz version and index vcf2index $outfolder/${outvcf} ## bcftools calls invcf=bcftools_htslib_variants/chr21_NA18507_var_bcftools.flt-D1000.vcf.gz outfolder=VCF_annotation outvcf=chr21_NA18507_bcftools-hg19.vcf build=hg19 # annotate and save all results to $outfolder java -jar $SNPEFF/snpEff.jar \ -htmlStats ${outfolder}/bcftools_snpEff_summary.html \ ${build} ${invcf} > $outfolder/${outvcf} # create .gz version and index vcf2index $outfolder/${outvcf}
The results of this command were copied to the BITS server:
- the annotated VCF can be reviewed here for varscan and here for bcftools
- the gene table here for varscan and here for bcftools
- the HTML report here for varscan and here for bcftools
As you can see from the report, the annotation added very valuable information to an otherwise quite flat list of genomic coordinates. The next User wish will be to isolate damaging variants or variants with non-synonymous effects. This is very pertinent and will be done in the next part using SnpSift.
Filter and select relevant data from a VCF file with SnpSift
SnpSift is a toolbox that allows you to filter and manipulate snpEff-annotated files
SnpSift full command list
SnpSift version 4.2 (build 2015-12-05), by Pablo Cingolani Usage: java -jar SnpSift.jar [command] params... Command is one of: alleleMat : Create an allele matrix output. annotate : Annotate 'ID' from a database (e.g. dbSnp). Assumes entries are sorted. annMem : Annotate 'ID' from a database (e.g. dbSnp). Loads db in memory. Does not assume sorted entries. caseControl : Compare how many variants are in 'case' and in 'control' groups; calculate p-values. ccs : Case control summary. Case and control summaries by region, allele frequency and variant's functional effect. concordance : Concordance metrics between two VCF files. covMat : Create an covariance matrix output (allele matrix as input). dbnsfp : Annotate with multiple entries from dbNSFP. extractFields : Extract fields from VCF file into tab separated format. filter : Filter using arbitrary expressions geneSets : Annotate using MSigDb gene sets (MSigDb includes: GO, KEGG, Reactome, BioCarta, etc.) gt : Add Genotype to INFO fields and remove genotype fields when possible. gtfilter : Filter genotype using arbitrary expressions. gwasCat : Annotate using GWAS catalog hwe : Calculate Hardy-Weimberg parameters and perform a godness of fit test. intersect : Intersect intervals (genomic regions). intervals : Keep variants that intersect with intervals. intIdx : Keep variants that intersect with intervals. Index-based method: Used for large VCF file and a few intervals to retrieve join : Join files by genomic region. phastCons : Annotate using conservation scores (phastCons). private : Annotate if a variant is private to a family or group. rmRefGen : Remove reference genotypes. rmInfo : Remove INFO fields. split : Split VCF by chromosome. tstv : Calculate transiton to transversion ratio. varType : Annotate variant type (SNP,MNP,INS,DEL or MIXED). vcfCheck : Check that VCF file is well formed. vcf2tped : Convert VCF to TPED. Options common to all SnpSift commands: -d : Debug. -download : Download database, if not available locally. Default: true. -noDownload : Do not download a database, if not available locally. -noLog : Do not report usage statistics to server. -h : Help. -v : Verbose.
find variants with impact
The first question when it comes to variants is which of them do have a strong impact on the gene product=Protein.
infolder=VCF_annotation infile=chr21_NA18507_bcftools-hg19.vcf.gz # infile=chr21_NA18507_varscan-hg19.vcf.gz outfolder=VCF_filtering mkdir -p ${outfolder} outfile=filtered-${infile} # filter 'STOP' variants and display results on screen # (goody: in bunches of 80 characters with one blank line between two calls) java -jar $SNPEFF/SnpSift.jar \ filter "ANN[0].EFFECT has 'stop_gained'" ${infolder}/${infile} | \ awk 1 ORS='\n\n' | fold -w 80 # only one variant is found by this command in a region reported as pseudogen #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GAIIx-chr21-BWA.mem chr21 35334572 . C T 225.009 . ANN=T|stop_gained|HIGH|LINC00649|LINC00649|transc ript|NM_001288961.1|protein_coding|2/2|c.283C>T|p.Gln95*|573/2263|283/405|95/134 ||,T|intron_variant|MODIFIER|LINC00649|LINC00649|transcript|NR_038883.1|pseudoge ne|2/2|n.618-6966C>T|||||| GT:PL 0/1:255,0,255
Which are the non-synonymous variants?
infolder=VCF_annotation infile=chr21_NA18507_bcftools-hg19.vcf.gz # infile=chr21_NA18507_varscan-hg19.vcf.gz outfolder=VCF_filtering mkdir -p ${outfolder} # filter 'missense_variant' variants and save results to file java -jar $SNPEFF/SnpSift.jar \ filter "ANN[0].EFFECT has 'missense_variant'" ${infolder}/${infile} \ > ${outfolder}/non-synonymous-${infile%%.gz} # compress and index vcf2index ${outfolder}/non-synonymous-${infile%%.gz} # how many did we get? grep -c -v "^#" ${outfolder}/non-synonymous-chr21_NA18507_bcftools-hg19.vcf 210
How many HIGH-IMPACT variants are predicted?
infolder=VCF_annotation infile=chr21_NA18507_bcftools-hg19.vcf.gz # infile=chr21_NA18507_varscan-hg19.vcf.gz outfolder=VCF_filtering mkdir -p ${outfolder} # filter 'HIGH impact' variants and save results to file java -jar $SNPEFF/SnpSift.jar filter "EFF[*].IMPACT = 'HIGH'" ${infolder}/${infile} \ > ${outfolder}/HIGH-${infile%%.gz} # compress and index vcf2index ${outfolder}/HIGH-${infile%%.gz} # how many did we get? grep -c -v "^#" ${outfolder}/HIGH-chr21_NA18507_bcftools-hg19.vcf 11 # review them grep -v "^#" ${outfolder}/HIGH-chr21_NA18507_bcftools-hg19.vcf | \ awk 1 ORS='\n\n' | fold -w 80
HIGH impact results (n=11)
chr21 11098723 . T C 225.009 . ANN=C|splice_donor_variant&intron_variant|HIGH|BA GE4|BAGE4|transcript|NM_181704.1|protein_coding|1/8|c.14+1A>G||||||,C|splice_don or_variant&intron_variant|HIGH|BAGE5|BAGE5|transcript|NM_182484.1|protein_coding |1/8|c.14+1A>G||||||WARNING_TRANSCRIPT_INCOMPLETE,C|splice_donor_variant&intron_ variant|HIGH|BAGE|BAGE|transcript|NM_001187.1|protein_coding|1/4|c.14+1A>G|||||| WARNING_TRANSCRIPT_INCOMPLETE,C|5_prime_UTR_variant|MODIFIER|BAGE3|BAGE3|transcr ipt|NM_182481.1|protein_coding|1/10|c.-6A>G|||||6|,C|5_prime_UTR_variant|MODIFIE R|BAGE2|BAGE2|transcript|NM_182482.2|protein_coding|1/10|c.-6A>G|||||6|;LOF=(BAG E|BAGE|1|1.00),(BAGE4|BAGE4|1|1.00),(BAGE5|BAGE5|1|1.00) GT:PL 0/1:255,0,255 chr21 14437495 . C G 143.032 . ANN=G|splice_acceptor_variant&intron_variant|HIGH |ANKRD30BP2|ANKRD30BP2|transcript|NR_026916.1|pseudogene|8/11|n.2166-1C>G|||||| GT:PL 1/1:176,24,0 chr21 31913981 . AG A 214.458 . ANN=A|frameshift_variant|HIGH|KRTAP19-6|KRTAP19- 6|transcript|NM_181612.3|protein_coding|1/1|c.171delC|p.Tyr58fs|202/330|171/177| 57/58||,A|splice_acceptor_variant&splice_donor_variant&intron_variant|HIGH|KRTAP 19-6|KRTAP19-6|transcript|NM_001303120.1|protein_coding|1/1|c.170+1delC||||||;LO F=(KRTAP19-6|KRTAP19-6|2|0.50) GT:PL 1/1:255,135,0 chr21 31971075 . TA TAA 217.468 . ANN=TAA|frameshift_variant|HIGH|KRTAP6-2|KRTAP 6-2|transcript|NM_181604.1|protein_coding|1/1|c.117_118insT|p.Tyr40fs|117/189|11 7/189|39/62||,TAA|upstream_gene_variant|MODIFIER|KRTAP22-1|KRTAP22-1|transcript| NM_181620.1|protein_coding||c.-2364dupA|||||2363|;LOF=(KRTAP6-2|KRTAP6-2|1|1.00) GT:PL 0/1:255,0,255 chr21 32201969 . GAA GA 214.458 . ANN=GA|frameshift_variant&splice_region_varian t|HIGH|KRTAP7-1|KRTAP7-1|transcript|NM_181606.2|protein_coding|1/2|c.47delT|p.Il e16fs|81/693|47/264|16/87||;LOF=(KRTAP7-1|KRTAP7-1|1|1.00) GT:PL 1/1:255,120,0 chr21 34948684 . GA GAA 214.458 . ANN=GAA|frameshift_variant|HIGH|SON|SON|transc ript|NM_138927.2|protein_coding|12/12|c.7236dupA|p.Ala2413fs|7292/8426|7237/7281 |2413/2426||,GAA|frameshift_variant|HIGH|SON|SON|transcript|NM_001291412.1|prote in_coding|11/11|c.1320dupA|p.Ala441fs|1376/2510|1321/1365|441/454||,GAA|downstre am_gene_variant|MODIFIER|DONSON|DONSON|transcript|NM_017613.3|protein_coding||c. *1927_*1928insT|||||1173|,GAA|intron_variant|MODIFIER|SON|SON|transcript|NR_1037 97.1|pseudogene|12/12|n.7357-39dupA|||||| GT:PL 1/1:255,87,0 chr21 34948696 . GA G 139.457 . ANN=G|frameshift_variant|HIGH|SON|SON|transcript |NM_138927.2|protein_coding|12/12|c.7248delA|p.Arg2416fs|7303/8426|7248/7281|241 6/2426||,G|frameshift_variant|HIGH|SON|SON|transcript|NM_001291412.1|protein_cod ing|11/11|c.1332delA|p.Arg444fs|1387/2510|1332/1365|444/454||,G|downstream_gene_ variant|MODIFIER|DONSON|DONSON|transcript|NM_017613.3|protein_coding||c.*1916del T|||||1162|,G|intron_variant|MODIFIER|SON|SON|transcript|NR_103797.1|pseudogene| 12/12|n.7357-27delA|||||| GT:PL 1/1:180,66,0 chr21 35334572 . C T 225.009 . ANN=T|stop_gained|HIGH|LINC00649|LINC00649|transc ript|NM_001288961.1|protein_coding|2/2|c.283C>T|p.Gln95*|573/2263|283/405|95/134 ||,T|intron_variant|MODIFIER|LINC00649|LINC00649|transcript|NR_038883.1|pseudoge ne|2/2|n.618-6966C>T|||||| GT:PL 0/1:255,0,255 chr21 45670770 . T C 54.0072 . ANN=C|protein_protein_contact|HIGH|DNMT3L|DNMT3L| interaction|NM_175867.2:4U7T_B:238_278|protein_coding|10/12|c.832A>G||||||,C|pro tein_protein_contact|HIGH|DNMT3L|DNMT3L|interaction|NM_175867.2:4U7T_D:238_278|p rotein_coding|10/12|c.832A>G||||||,C|missense_variant|MODERATE|DNMT3L|DNMT3L|tra nscript|NM_013369.3|protein_coding|10/12|c.832A>G|p.Arg278Gly|1316/1706|832/1164 |278/387||,C|missense_variant|MODERATE|DNMT3L|DNMT3L|transcript|NM_175867.2|prot ein_coding|10/12|c.832A>G|p.Arg278Gly|1316/1703|832/1161|278/386|| GT:PL 0/1:84, 0,97 chr21 45994841 . A C 124.008 . ANN=C|stop_lost|HIGH|KRTAP10-4|KRTAP10-4|transcri pt|NM_198687.2|protein_coding|1/1|c.1206A>C|p.Ter402Cysext*?|1236/1643|1206/1206 |402/401||,C|downstream_gene_variant|MODIFIER|KRTAP10-5|KRTAP10-5|transcript|NM_ 198694.3|protein_coding||c.*4799T>G|||||4491|,C|intron_variant|MODIFIER|TSPEAR|T SPEAR|transcript|NM_144991.2|protein_coding|1/11|c.83-6952T>G||||||,C|intron_var iant|MODIFIER|TSPEAR|TSPEAR|transcript|NM_001272037.1|protein_coding|2/12|c.-122 -6952T>G|||||| GT:PL 0/1:154,0,134 chr21 46703410 . C T 119.008 . ANN=T|protein_protein_contact|HIGH|POFUT2|POFUT2| interaction|NM_133635.4:4AP5_A:139_191|protein_coding|3/9|c.415G>A||||||,T|prote in_protein_contact|HIGH|POFUT2|POFUT2|interaction|NM_133635.4:4AP5_A:139_193|pro tein_coding|3/9|c.415G>A||||||,T|protein_protein_contact|HIGH|POFUT2|POFUT2|inte raction|NM_133635.4:4AP5_B:139_191|protein_coding|3/9|c.415G>A||||||,T|protein_p rotein_contact|HIGH|POFUT2|POFUT2|interaction|NM_133635.4:4AP5_B:139_193|protein _coding|3/9|c.415G>A||||||,T|missense_variant|MODERATE|POFUT2|POFUT2|transcript| NM_133635.4|protein_coding|3/9|c.415G>A|p.Val139Ile|440/2869|415/1290|139/429||, T|missense_variant|MODERATE|POFUT2|POFUT2|transcript|NM_015227.4|protein_coding| 3/8|c.415G>A|p.Val139Ile|440/4823|415/1275|139/424||,T|upstream_gene_variant|MOD IFIER|LOC642852|LOC642852|transcript|NR_026943.1|pseudogene||n.-4557C>T|||||4557 |,T|non_coding_exon_variant|MODIFIER|POFUT2|POFUT2|transcript|NR_004858.1|pseudo gene|3/10|n.440G>A|||||| GT:PL 0/1:149,0,136
Many more and diverse operations (and complex combinations thereof) can be done using this tool.
Please read the full documentation here and try some commands using our VCF data.
download exercise files
Download exercise files here
References:
- ↑ http://snpeff.sourceforge.net
- ↑
Pablo Cingolani, Adrian Platts, Le Lily Wang, Melissa Coon, Tung Nguyen, Luan Wang, Susan J Land, Xiangyi Lu, Douglas M Ruden
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.
Fly (Austin): 2012, 6(2);80-92
[PubMed:22728672] ##WORLDCAT## [DOI] (I p)Pablo Cingolani, Viral M Patel, Melissa Coon, Tung Nguyen, Susan J Land, Douglas M Ruden, Xiangyi Lu
Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift.
Front Genet: 2012, 3;35
[PubMed:22435069] ##WORLDCAT## [DOI] (I e) - ↑ http://www.sph.umich.edu/csg/liyanmin/vcfCodingSnps
- ↑ http://www.openbioinformatics.org/annovar/
- ↑ http://snpeff.sourceforge.net/SnpEff_manual.html
- ↑ http://snpeff.sourceforge.net/SnpSift.html
[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2016 | NGS-Var Exercise.5 | NGS-Var Exercise.7 ]