NGS Exercise.7

From BITS wiki
Jump to: navigation, search


[ Main_Page | Hands-on introduction to NGS variant analysis | NGS-formats |
| NGS Exercise.6 | NGS Exercise.7_SnpEff | NGS_Exercise.7_vcfCodingSnps | NGS Exercise.7_annovar | NGS Exercise.8 ]
# updated 2014 version


Add annotations to variant calling format


ex07_wf.png

Introduction

Variant lists are important but often long and not easy to evaluate. In order to rank candidate variant for validation, we need to know where these variants occur and what effect they may have on the regulation of genes when close or included into a gene region or on the protein product when falling into exons.

Handicon.png Whatever software you apply to clean your variant calls, you will still need to validate what you get from NGS :o)

Several tools exist to annotate variant lists and predict variant effects, we present here several popular tools

WEB based tools that will expose your data to the internet

Command-line tools running on your local computer

Other 'not tested' variant annotation tools exist

Web-based variant annotation tools

Some will prefer a quick and easy analysis platform. For those lazy and hurry user, web-alternatives exist to annovar and are briefly presented now. Before jumping onto these.

Handicon.png Submitting your data on a web page means exposing it to the public and may violate patient confidentiality claims and/or compromise patent'ability of your findings at a later stage. Discuss this with your supervisor before doing it

EnsEMBL VEP quick overview

Developed by the EnsEMBL team, this tool is also made of Perl code and interacts with the huge EnsEMBL database to collect annotations and up-to-date genome information. Both Web and standalone versions are available (info:<http://www.ensembl.org/info/docs/variation/vep/index.html#web>, web:<http://www.ensembl.org/tools.html>). Note that the web interface is limited to few 100's of variants (750) and that you will 'violate' confidentiality terms by posting your variants on the Internet. Results are in VEP format that can be read and filtered in your favorite spreadsheet application (hélas!) or better in Google Refine if you have a huge file to work on.

the VEP submission page

ensembl_vep_1.png

UCSC VAI: a starter

Uploading the variant list requires reformatting the list in vcf format and selecting the matching reference genome on the submission page.

the VAI submission page

vai_1.png

Most if not all annotation types shown in the screenshot above are accessible in Annovar. Typical annotations created by VAI are as follows

the VAI annotation types

vai_2.png

results are formatted in VEP format similar to that generated by the homologous EnsEMBL tool.

Technical.png The major drawbacks of WEB annotators are twofold - i) the lackof confidentiality when submitting your variants to the WEB, and ii) the size limit to few 100 lines when submitting data to the WEB tools (except for SeattleSeq that supported our 80'000 rows). For these reasons, the inline standalone tools described next are preferred by most advanced users.

SeattleSeq: a starter

The server can be found at http://snp.gs.washington.edu/. [10] Uploading a VCF variant list is possible. The analysis is queued and a mail sent when results are ready

The SeattleSeq Annotation server provides annotation of SNVs (single-nucleotide variations) and small indels, both known and novel. This annotation includes dbSNP rs ID, gene names and accession numbers, variation functions (e.g. missense), protein positions and amino-acid changes, conservation scores, HapMap frequencies, PolyPhen predictions, and clinical association. Links to other annotation sites are also provided.

the SeattleSeq submission page

SeatleSeq.png

top five rows of results

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	GAIIx-chr21-BWA.mem
21	9467416	rs369392604	C	T	86	.	\
   SF=0,1;DN=138;DA=C/T;GM=none;\
   FG=intergenic;FD=unknown;CP=0.170;AA=C;DSP=440077	GT:PL:GQ	0/1:116,0,126:99
21	9467417	rs372306150	A	C	111	.	\
   SF=0,1,2;DN=138;DA=A/C;GM=none;\
   FG=intergenic;FD=unknown;CP=0.155;AA=A;DSP=440076	GT:PL:GQ	0/1:141,0,95:98
21	9471670	.	A	G	37.8	.	\
   SF=0,1,2;GM=none;FG=intergenic;FD=unknown;\
   CP=0.000;CG=0.202;AA=G;DSP=435823	GT:PL:GQ	1/1:69,6,0:10
21	9472931	.	T	G	52	.	\
   SF=0,1,2;GM=none;FG=intergenic;FD=unknown;\
   CP=0.002;CG=-0.931;AA=T;RM=MLT1D;DSP=434562	GT:PL:GQ	1/1:84,9,0:16
21	9473159	rs74477762	A	G	52	.	\
   SF=0,1,2;DN=131;DA=A/G;GM=none;\
   FG=intergenic;FD=unknown;CP=0.000;CG=-1.430;AA=G;DG;DV=by-frequency,by-cluster;\
   DSP=434334	GT:PL:GQ	1/1:84,9,0:16

variants with stop codons ('TRP/stop' in C21orf33; 'stop/CYS' in TSPEAR/KRTAP10-4)

21	45557227	rs74418161	G	A	216	.	SF=0,1,2;DN=132;DA=G/A;\
   GM=NM_004649.6,NM_198155.3,XM_005261183.1,XM_005261184.1,XM_005261185.1,\
   XM_005261186.1,XM_005261187.1;GL=C21orf33;FG=intron,intron,stop-gained,intron,intron,intron,intron;\
   FD=intron-variant,intron-variant,unknown,unknown,unknown,unknown,unknown;AAC=none,none,\
   TRP/stop,none,none,none,none;PP=NA,NA,159/296,NA,NA,NA,NA;CDP=NA,NA,477,NA,NA,NA,NA;\
   CP=0.000;CG=-3.700;AA=G;DG;DV=by-frequency,by-cluster,by-1000G;DSP=33;GESP=A:518/G:12488;\
   PAC=NA,NA,XP_005261240.1,NA,NA,NA,NA	GT:PL:GQ	0/1:246,0,252:99
21	45994841	rs7276273	A	C	124	.	SF=0,1,2;DN=116;DA=A/C;\
   GM=NM_001272037.1,NM_144991.2,NM_198687.1,XM_005261158.1;\
   GL=TSPEAR/KRTAP10-4;FG=intron,intron,stop-lost,intron;\
   FD=intron-variant,intron-variant,stop-lost,unknown;AAC=none,none,stop/CYS,none;\
   PP=NA,NA,402/402,NA;CDP=NA,NA,1206,NA;CP=0.962;CG=4.380;AA=C;DG;\
   DV=by-frequency,by-cluster,by-2hit-2allele,by-1000G;DSP=1185;GESP=C:970/A:11952;\
   PAC=NA,NA,NP_941960.1,NA	GT:PL:GQ	0/1:154,0,134:99

Technical.png The major drawbacks of WEB annotators are twofold/ the lackof confidentiality when submitting your variants to the WEB, and ii) the size limit to few 100's lines when submitting data to the WEB tools. For these reasons, the inline standalone tools described next are preferred by most advanced users.

Annovar - not perfect but still very performant

Annovar is the historical tool for annotating large lists of variants. It was designed at the early times of the VCF format and instead of adopting it, went its own wxay with its oan tabular format. This makes Annovar not so handy today as most other tools accept VCF and or BED which are not native Annovar formats.

Please refer to the separate in NGS_Exercise.7_annovar page for a startup and to the online full documentation.

VCF inline annotation tools

Two tools are described and briefly illustrated in the following pages that ADD annotations to the VCF data instead of creating non-VCF annotated tables like Annovar does. Inline annotations are powerful as they complement the variant descriptions present in the VCF while keeping all other VCF annotations BUT the resulting file is quite hard to read by human and requires post-processing and filtering to become valuable.

vcfCodingSnps

This software has been released few years ago but is still valid. We present it briefly in the following page NGS_Exercise.7_vcfCodingSnps.

SnpEff & SnpSift

Last but not least. SnpEff is the raising star for VCF annotation and filtering. This is a very powerful toolset co-developped with the Broad Institute and that will likely become the standard like GATK already is for mapping. Please refer to the separate in NGS_Exercise.7_SnpEff page for a startup and to the online full documentation.

download exercise files

Download exercise files here

Use the right application to open the files present in ex7-files

References:
  1. http://www.ensembl.org/info/docs/variation/vep/index.html
  2. http://genome.ucsc.edu/cgi-bin/hgVai
  3. http://www.openbioinformatics.org/annovar/
  4. Kai Wang, Mingyao Li, Hakon Hakonarson
    ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.
    Nucleic Acids Res: 2010, 38(16);e164
    [PubMed:20601685] ##WORLDCAT## [DOI] (I p)

  5. http://www.sph.umich.edu/csg/liyanmin/vcfCodingSnps
  6. http://snpeff.sourceforge.net
  7. http://varianttools.sourceforge.net/Annotation/HomePage
  8. http://vat.gersteinlab.org/download.php
  9. http://www.bioconductor.org/packages/2.12/bioc/html/VariantAnnotation.html
  10. http://snp.gs.washington.edu/SeattleSeqAnnotation138/index.jsp

[ Main_Page | Hands-on introduction to NGS variant analysis | NGS-formats |
| NGS Exercise.6 | NGS Exercise.8 ]