NGS-Var2017 Exercise.2

From BITS wiki
Jump to: navigation, search


[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2017 | NGS-Var2017 Exercise.1 | NGS-Var2017 Exercise.3 ]


Align paired end reads to the human reference genome hg19 using the Burrow Wheeler Aligner (BWA)


ex02_wf.png

Introduction

Reference mapping is the process applied to NGS reads when the reference genome is available. Mapping (aligning) reads to the reference is required in order to later pileup all alignment results and search for variants at each conflicting position. In the mapping step, each read is aligned to the reference genome and the genome coordinate of the best hit(s) is(are) stored together with the read sequence and quality parameters in a SAM/BAM file. This is the most time consuming step of NGS analysis and its quality and completeness will condition all downstream processes.

Error creating thumbnail: Unable to save thumbnail to destination
Full mapping of an average human NGS Illumina dataset (100M read pairs) will take several days and use full computer power on a 48cpu computer with 48GB RAM (values are indicative).

prepare the reference genome for BWA alignment

BWA aligns reads to a library of possible short nucleotides (hash table). A hash table is build once for each new reference genome using one of BWA commands. This step was performed for you and the reference index saved to the GenePattern dserver under the name hg19.

Align the reads in pairs to the reference genome using the bwa mem algorithm

Error creating thumbnail: Unable to save thumbnail to destination
We will do this step using the 10% sample and not the full data in order to speed up the process
  • start the BWA mem module
  • in the 'input' parameter group, link the reference and the two 10% read files in the corresponding fields
ex2_01.png
  • review the optional settings but do not change the defaults
  • edit the last parameter group as shown in the picture
ex2_02.png
  • run and wait for results, you should get a job as shown next
ex2_03.png

Sort results by coordinates using Picard.SortSam

This step is required in order to prepare for the next QC step. BWA has saved the reads as they came and they are not sorted in any way. We will now reorder the reads with Picard.SortSam to match the reference genome used for mapping.

  • start the Picard.SortSam module
  • in the 'input' parameter group, link the BWA 10% read-mapping files
ex2_04.png
  • set other parameters as shown
  • run and wait for job end
ex2_05.png
Error creating thumbnail: Unable to save thumbnail to destination
Look at the stderr file, if we did not use Lenient, this job would have failed!

Extract chr21 mappings and sort the output BAM file in coordinate order

Since we used chr21 reads, one could expect that they all map to chr21; As usual with NGS, what you get is not what you necessarily expected and we also get alignments to other chromosomes. We will here select all reads mapping to chr21 ands store them into a new file before proceeding. Similarly, you may want to select all mappings to a list of genes (target panel) and make a corresponding subset for your own needs.

  • start the SAMtools.SamView module
  • select the mappings as input
ex2_06.png
  • fill additional parameters and run
ex2_07.png
  • wait for the results
ex2_08.png

identify duplicate reads with Picard MarkDuplicates

The presence of PCR duplicates in NGS libraries can cause false positive variant calls at later stages. For this reason, the duplicates present after mapping to the reference genome must be marked using the dedicated Picard tool.

Technical.png You can read more about read duplicates in our [Q&A section about read duplicates]

  • start the Picard.MarkDuplicates module
  • fill the first parameter group as shown next (if you performed the sorting, use your output as input SAM, otherwise, take the sorted BAM file from the store
ex2_09.png
  • scroll down and fill the output parameters as shown
ex2_10.png
  • run and wait for the results
ex2_11.png
  • open the summary text file to get numbers
ex2_12.png

download exercise files

Download exercise files here

Use the right application to open the files present in ex2-files

References:

[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2017 | NGS-Var2017 Exercise.1 | NGS-Var2017 Exercise.3 ]