NGS-Var2017 Exercise.2
[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2017 | NGS-Var2017 Exercise.1 | NGS-Var2017 Exercise.3 ]
Align paired end reads to the human reference genome hg19 using the Burrow Wheeler Aligner (BWA)
Contents
- 1 Introduction
- 2 prepare the reference genome for BWA alignment
- 3 Align the reads in pairs to the reference genome using the bwa mem algorithm
- 4 Sort results by coordinates using Picard.SortSam
- 5 Extract chr21 mappings and sort the output BAM file in coordinate order
- 6 identify duplicate reads with Picard MarkDuplicates
- 7 download exercise files
Introduction
Reference mapping is the process applied to NGS reads when the reference genome is available. Mapping (aligning) reads to the reference is required in order to later pileup all alignment results and search for variants at each conflicting position. In the mapping step, each read is aligned to the reference genome and the genome coordinate of the best hit(s) is(are) stored together with the read sequence and quality parameters in a SAM/BAM file. This is the most time consuming step of NGS analysis and its quality and completeness will condition all downstream processes.
prepare the reference genome for BWA alignment
BWA aligns reads to a library of possible short nucleotides (hash table). A hash table is build once for each new reference genome using one of BWA commands. This step was performed for you and the reference index saved to the GenePattern dserver under the name hg19.
Align the reads in pairs to the reference genome using the bwa mem algorithm
- start the BWA mem module
- in the 'input' parameter group, link the reference and the two 10% read files in the corresponding fields
- review the optional settings but do not change the defaults
- edit the last parameter group as shown in the picture
- run and wait for results, you should get a job as shown next
Sort results by coordinates using Picard.SortSam
This step is required in order to prepare for the next QC step. BWA has saved the reads as they came and they are not sorted in any way. We will now reorder the reads with Picard.SortSam to match the reference genome used for mapping.
- start the Picard.SortSam module
- in the 'input' parameter group, link the BWA 10% read-mapping files
- set other parameters as shown
- run and wait for job end
Extract chr21 mappings and sort the output BAM file in coordinate order
Since we used chr21 reads, one could expect that they all map to chr21; As usual with NGS, what you get is not what you necessarily expected and we also get alignments to other chromosomes. We will here select all reads mapping to chr21 ands store them into a new file before proceeding. Similarly, you may want to select all mappings to a list of genes (target panel) and make a corresponding subset for your own needs.
- start the SAMtools.SamView module
- select the mappings as input
- fill additional parameters and run
- wait for the results
identify duplicate reads with Picard MarkDuplicates
The presence of PCR duplicates in NGS libraries can cause false positive variant calls at later stages. For this reason, the duplicates present after mapping to the reference genome must be marked using the dedicated Picard tool.
You can read more about read duplicates in our [Q&A section about read duplicates]
- start the Picard.MarkDuplicates module
- fill the first parameter group as shown next (if you performed the sorting, use your output as input SAM, otherwise, take the sorted BAM file from the store
- scroll down and fill the output parameters as shown
- run and wait for the results
- open the summary text file to get numbers
download exercise files
Download exercise files here
References:
[ Main_Page | Hands-on_introduction_to_NGS_variant_analysis-2017 | NGS-Var2017 Exercise.1 | NGS-Var2017 Exercise.3 ]