Hands-on introduction to NGS variant analysis-2017

[ Main_Page | NGS_data_analysis ]
# one-day training 2017 session

This session is a GenePattern rewritten version of the simplified 2016 version (Hands-on introduction to NGS variant analysis-2016)} of a more complete and exploratory training given in 2013 and 2014 (Hands-on introduction to NGS variant analysis)

The most recent version of this training (2018) can be found at (Hands-on_introduction_to_NGS_variant_analysis-2018)

Aims of the NGS DNA variant analysis 1-day session

Using a full publicly available chromosome read-set from one of the 1000 genomes^[1] samples:

Use the graphical environment provided by the BITS GenePattern server to evaluate minimal step of a classical NGS variant workflow and feel the complexity of the task.
Perform a simplified analysis workflow including read mapping, variant calling against the human reference genome
Annotate and compare the variant calls obtained from two popular callers.
get the motivation to go to the next level and learn command line and R

More BITS Training Info

On the VIB website: http://www.vib.be/en/training/research-training/courses/Pages/Hands-On-introduction-to-NGS-variant-analysis.aspx
On the BITS website: https://www.bits.vib.be/index.php/training/201-ngs-variant-analysis

Summary

This training gives an introduction to the use of popular NGS analysis software packages through the GenePattern graphical interface. It reviews several exchangeable tools and provides hints to evaluate quality and content of Genome-Seq data. Much more can be (and should be) done when working at command-line and GenePattern will not replace advanced use of the terminal. However, this simplified workflow will allow unexperienced scientists to discover the practices involved with variant analysis. A recent review by Geraldine Van der Auwera et al develops on many aspects of this theme ^[2].

We did not use the Genome Analysis Toolkit (GATK') in this training due to licensing limitations but strongly advise you to consider it in your work. GATK was compared to varscan used in this training and is clearly superior in sensitivity and specificity ^[3].

The sequencing data used in this session was obtained from gDNA extracted from EBV-transformed B-lymphocytes of a healthy Nigerian individual (NA18507). More information is available for that sample from the Coriell repository from which 1000g gDNA can be obtained ^[4].

This training does not cover all currently available methods. It does not aim at bringing users to a professional NGS analyst level but provides enough information to allow motivated biologists understand what DNA sequencing practically is, and when necessary to communicate knowingly with NGS experts for more in-depth needs.

Prerequisites

Skills required to follow this training:

Linux command line basic skills are required to review some of the long text results under terminal (GenePattern is not handy in reviewing data, it is mainly a computing platform)
basic knowledge of human genome structure and nomenclature is necessary to estimate the training tasks
basic knowledge of Illumina NGS read structure is also required for the same reason

Software used during this training:

All programs used in this training session were installed on the BITS GenePattern server and specific modules where created for you by out BITS colleague Guy Bottu . If you plan to build your own GenePattern Server (GP home page), you may ask Guy copies of these modules contact us.

Today's Hands-On Exercises

NGS-Var2017 Startup GenePattern: Locate data and tools in your training account

NGS-Var2017 Exercise.1: QC paired end reads using fastQC
NGS-Var2017 Exercise.2: Map a sample of the original paired end reads to the human reference genome hg19 using the Burrow Wheeler Aligner (BWA)
NGS-Var2017 Exercise.3: BWA mapping QC using Samtools, Picard, and Qualimap
NGS-Var2017 Exercise.4: Call variants as compared to the human reference genome (hg19) with samtools|bcftools or samtools|varscan
NGS-Var2017 Exercise.5: Intersect VCF files using VCF_intersect
NGS-Var2017 Exercise.6: Annotate and filter VCF variant lists with SNPSift and SnpEff
NGS-Var2017 Exercise.7: Review variants and annotations in IGV

Answers to your requests

Some of you asked about the possibility to call variants from structured experiments like family trios or tumor-normal pairs. I found trio data for one of the public 1000 genome trio (CEPH CEU) as well as a tumor-normal pancreatic cancer dataset (WES) with which I prepared two walk-through tutorial for command-line analysis using Varscan2. Both documents will be added to the webserver (links below) as they become available.

call variants from a family trio and identify inherited and potential non-inherited variants (de-novo) using Varscan 2 (link - at work)
call variants from a pair of samples (tumor and normal tissues of the same patient) and identify somatic variants and CNA (copy number abnormalities) in the tumor sample (link)

The BITS Genepattern will soon be installed with Varscan modules able to reproduce the main results presented in these reports so that you can also perform the trio and somatic analysis without command-line.

Please contact us for more info and to report inconsistencies in these documents

Find more tools and answers

There are many tools out there, finding them is often the easiest part. You are welcome to try as many as you wish and improve results obtained with our selected toolbox. When seeking advice, please consider using:

SeqAnswers^[5] for all what relates to NGS
BioStar^[6] for questions about biocomputing and scripting for biologists
stackoverflow^[7] for questions related to coding.

bioinformatics.ca directory^[8] to find bioinformatics tools sorted by categories.

The EBI online training propose many very nice training sessions with slides and exercises.

explain SAM BAM flags^[9] (mirrored at BITS)

Conclusion

Your feedback to this introductory NGS variant analysis using GenePattern session is very important to us and will be used to improve this content for later sessions. If you need more training of this kind, please contact us and we will organise additional hands-on based on your requests. More advance sessions will depend on the availability of expert users within VIB that will accept to prepare specified material.

contact us

References:

↑ http://www.1000genomes.org
↑
Geraldine A Van der Auwera, Mauricio O Carneiro, Christopher Hartl, Ryan Poplin, Guillermo Del Angel, Ami Levy-Moonshine, Tadeusz Jordan, Khalid Shakir, David Roazen, Joel Thibault, Eric Banks, Kiran V Garimella, David Altshuler, Stacey Gabriel, Mark A DePristo
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.
Curr Protoc Bioinformatics: 2013, 43(1110);11.10.1-11.10.33
[PubMed:25431634] ##WORLDCAT## [DOI] (I p)
↑
Charles D Warden, Aaron W Adamson, Susan L Neuhausen, Xiwei Wu
Detailed comparison of two popular variant calling packages for exome and targeted exon studies.
PeerJ: 2014, 2;e600
[PubMed:25289185] ##WORLDCAT## [DOI] (P e)
↑ http://ccr.coriell.org/Sections/Search/Sample_Detail.aspx?Ref=GM18507
↑ http://seqanswers.com SeqAnswers
↑ http://www.biostars.org BioStar
↑ http://stackoverflow.com stackoverflow
↑ http://bioinformatics.ca/links_directory
↑ https://broadinstitute.github.io/picard/explain-flags.html

[ Main_Page ]

[1] ttp://www.1000genomes.org

[2] 
Geraldine A Van der Auwera, Mauricio O Carneiro, Christopher Hartl, Ryan Poplin, Guillermo Del Angel, Ami Levy-Moonshine, Tadeusz Jordan, Khalid Shakir, David Roazen, Joel Thibault, Eric Banks, Kiran V Garimella, David Altshuler, Stacey Gabriel, Mark A DePristo
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.
Curr Protoc Bioinformatics: 2013, 43(1110);11.10.1-11.10.33
[PubMed:25431634] ##WORLDCAT## [DOI] (I p)

[3] 
Charles D Warden, Aaron W Adamson, Susan L Neuhausen, Xiwei Wu
Detailed comparison of two popular variant calling packages for exome and targeted exon studies.
PeerJ: 2014, 2;e600
[PubMed:25289185] ##WORLDCAT## [DOI] (P e)

[4] ttp://ccr.coriell.org/Sections/Search/Sample_Detail.aspx?Ref=GM18507

[5] ttp://seqanswers.com SeqAnswers

[6] ttp://www.biostars.org BioStar

[7] ttp://stackoverflow.com stackoverflow

[8] ttp://bioinformatics.ca/links_directory

[9] ttps://broadinstitute.github.io/picard/explain-flags.html

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Hands-on introduction to NGS variant analysis-2017

Contents

Aims of the NGS DNA variant analysis 1-day session

Summary

Prerequisites

Today's Hands-On Exercises

Answers to your requests

Find more tools and answers

Conclusion

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox