[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2017 | NGS-Var2017 Exercise.1 | NGS-Var2017 Exercise.3 ]

Align paired end reads to the human reference genome hg19 using the Burrow Wheeler Aligner (BWA)

1 Introduction
2 prepare the reference genome for BWA alignment
3 Align the reads in pairs to the reference genome using the bwa mem algorithm
4 Sort results by coordinates using Picard.SortSam
5 Extract chr21 mappings and sort the output BAM file in coordinate order
6 identify duplicate reads with Picard MarkDuplicates
7 download exercise files

Introduction

Reference mapping is the process applied to NGS reads when the reference genome is available. Mapping (aligning) reads to the reference is required in order to later pileup all alignment results and search for variants at each conflicting position. In the mapping step, each read is aligned to the reference genome and the genome coordinate of the best hit(s) is(are) stored together with the read sequence and quality parameters in a SAM/BAM file. This is the most time consuming step of NGS analysis and its quality and completeness will condition all downstream processes.

Error creating thumbnail: Unable to save thumbnail to destination

Full mapping of an average human NGS Illumina dataset (100M read pairs) will take several days and use full computer power on a 48cpu computer with 48GB RAM (values are indicative).

prepare the reference genome for BWA alignment

BWA aligns reads to a library of possible short nucleotides (hash table). A hash table is build once for each new reference genome using one of BWA commands. This step was performed for you and the reference index saved to the GenePattern dserver under the name hg19.

Align the reads in pairs to the reference genome using the bwa mem algorithm

Error creating thumbnail: Unable to save thumbnail to destination

We will do this step using the 10% sample and not the full data in order to speed up the process

start the BWA mem module
in the 'input' parameter group, link the reference and the two 10% read files in the corresponding fields

review the optional settings but do not change the defaults
edit the last parameter group as shown in the picture

run and wait for results, you should get a job as shown next

Sort results by coordinates using Picard.SortSam

This step is required in order to prepare for the next QC step. BWA has saved the reads as they came and they are not sorted in any way. We will now reorder the reads with Picard.SortSam to match the reference genome used for mapping.

start the Picard.SortSam module
in the 'input' parameter group, link the BWA 10% read-mapping files

set other parameters as shown
run and wait for job end

Error creating thumbnail: Unable to save thumbnail to destination

Look at the stderr file, if we did not use Lenient, this job would have failed!

Extract chr21 mappings and sort the output BAM file in coordinate order

Since we used chr21 reads, one could expect that they all map to chr21; As usual with NGS, what you get is not what you necessarily expected and we also get alignments to other chromosomes. We will here select all reads mapping to chr21 ands store them into a new file before proceeding. Similarly, you may want to select all mappings to a list of genes (target panel) and make a corresponding subset for your own needs.

start the SAMtools.SamView module
select the mappings as input

fill additional parameters and run

wait for the results

identify duplicate reads with Picard MarkDuplicates

The presence of PCR duplicates in NGS libraries can cause false positive variant calls at later stages. For this reason, the duplicates present after mapping to the reference genome must be marked using the dedicated Picard tool.

You can read more about read duplicates in our [Q&A section about read duplicates]

start the Picard.MarkDuplicates module
fill the first parameter group as shown next (if you performed the sorting, use your output as input SAM, otherwise, take the sorted BAM file from the store

scroll down and fill the output parameters as shown

run and wait for the results

open the summary text file to get numbers

download exercise files

Download exercise files here

Use the right application to open the files present in ex2-files

References:

[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2017 | NGS-Var2017 Exercise.1 | NGS-Var2017 Exercise.3 ]

NGS-Var2017 Exercise.2

Contents

Introduction

prepare the reference genome for BWA alignment

Align the reads in pairs to the reference genome using the bwa mem algorithm

Sort results by coordinates using Picard.SortSam

Extract chr21 mappings and sort the output BAM file in coordinate order

identify duplicate reads with Picard MarkDuplicates

download exercise files

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Resources

Toolbox