This wiki page is dedicated to the series of trainings that will lead you through the various workflows for the analysis of next generation sequencing data.
Have fun solving the exercises!

[ Main_Page ]

Because most of you have used or will use the Illumina platform to generate their data, we will use Illumina data sets in all exercises

Training 1: Introduction to the analysis of NGS data

Periodically repeated Sessions (Janick Mathys)

Slides

Exercises

This training gives you the background knowledge you need to follow the more advanced trainings on variant analysis, RNA-Seq and ChIP-Seq.

Download the data sets for this training:

Now you can try the exercises.

Archive

FAQ

Q&A added during the intro to NGS data analysis

File formats

Training 2: NGS variant analysis

Session of November 2018 (Stéphane Plaisance)

Session of 2018 using GenePattern

Session of 2020 using GenePattern

Training archive

Introduction to NGS-formats used in classical NGS applications and used today in the hands-on
Remarks about NGS variant analysis: laptop configuration and files
Session of 2017 using GenePattern
Hands-on_introduction_to_NGS_variant_analysis-2016 - pages dedicated to the 2016 1-day session using Linux command line.
Hands-on_introduction_to_NGS_variant_analysis - pages dedicated to the 2014 2-days session using Linux command line.
Slides presented during the 2014 session: NGS_DNA-variants_2014-05-23_slides.pdf.

Q&A pages

Q&A_added_during_the_NGS_variant_analysis_training
Q&A added during the NGS variant analysis training2 (+ new Q&A's from May 2014)

HowTo Pages related to this training

Training 3: RNA-Seq analysis

Bulk RNA-Seq analysis for differential expression

Tools

Install the latest version of R and RStudio. List of R packages used in the training:

ggplot2
ggrepel
gplots
pheatmap
plyr
RColorBrewer
reshape2
Bioconductor
Bioconductor: airway
Bioconductor: DESeq2
Bioconductor: GenomicAlignments
Bioconductor: GenomicFeatures
Bioconductor: org.Hs.eg.db
Bioconductor: Rsamtools
Bioconductor: tximeta
Only for Mac users: Bioconductor: Rsubread

Slides

Exercises

Solutions FASTQC analysis of Arabidopsis data
Solutions FASTQC analysis of human data
Trimmomatic manual
STAR manual
samtools tutorial
RSeQC manual
htseq-count tutorial
Solutions command line workflow
Bash script for automating command line RNASeq workflow
R script with solutions counting exercises
R script counting
R script with solutions DESeq2 analysis
R script DESeq2
DESeq2 tutorial
R script with solutions EdgeR analysis

Files

Extra links

QuantSeq data analysis
Slides on how cutadapt works
Cutadapt manual
featureCounts or htseq-count?
R Script for RNASeq variant analysis
Instruction for hands-on RNASeq variant analysis
presentation RNASeq variant analysis

Single cell RNA-Seq analysis

Tools

Install the latest version of R and RStudio. List of R packages used in the training:

dplyr
gridExtra
rgl
Seurat
stringr
Bioconductor: scater

Slides

Slides of the analysis of aggregated brain data sets of 2000 and 1000 cells

Exercises

simple R script for Seurat analysis of aggregated brain data sets of 2000 and 1000 cells
R script with extra functions for Seurat analysis of aggregated brain data sets of 2000 and 1000 cells
full R notebook for Seurat analysis of aggregated brain data sets of 2000 and 1000 cells
notebook on full Seurat analysis (open in web browser)

Files

aggregated data: output of CellRanger aggregate to be used as input of the script for Seurat analysis of aggregated brain data sets

Extra links

Slides introduction to 10xGenomics made by Mike Stubbington from 10xGenomics
bcl2fastq tutorial (the tool that was used as a basis for cellranger mkfastq)
Slides on trajectory analysis
Tutorial on trajectory analysis
R script for trajectory analysis
2000 brain cells mouse data set
1000 brain cells mouse data set

Summer school 2018

Scenic

Experimental design

Slides

Integration of omics data

What after the summer school ?

Bulk RNA-Seq - from raw reads to counts:

We have two GenePattern servers running that contain all the tools discussed in the training. Send an email to bits@vib.be to get an account
We can provide a snapshot of the server you worked on during the training. You can then make your own server on Google cloud (it's easy starting from a snapshot). You will have to pay for that.

Bulk RNA-Seq - finding DE genes:

You can do the R analysis on your own computer: see this section for the list of packages you need to install.
We can provide a snapshot of the server you worked on during the training. You can then make your own server on Google cloud (it's easy starting from a snapshot). You will have to pay for that.

Single cell RNA-Seq:

You can do the Seurat analysis on your own computer: see .this section for the list of packages you need to install.
We can provide a snapshot of the server you worked on during the training. You can then make your own server on Google cloud (it's easy starting from a snapshot). You will have to pay for that.
In the future you can get support from Niels and Liesbet. Contact scRNAseq@irc.vib-ugent.be for more information.
We will check if cell ranger is installed on KULeuven vsc (accessible by people from KULeuven and UHasselt).

A GIT page has been started to post your issues and share with us, you can reach it at https://github.com/BITS-VIB/Summer_school_2018

NGS_data_analysis_tools A page listing tools found during the day and that you may want to install on your computer

Archive

Session of March 20th and 23rd, 2015 (Stéphane Plaisance)

repeated September 25, 2015

Hands-on_introduction_to_NGS_RNASeq_DE_analysis - the pages of the actual training
containing a hands-on workflow of RNA-Seq analysis for differential expression using command line tools.

creating ENV variables for the training

Create a new file with "sudo /etc/profile.d/bits.sh" and paste the following content

# system wide ENV variables to ease path in training exercises
export SUMMER=/usr/summer
export SOFT=$SUMMER/software
export REFS=$SUMMER/refs
export DATA=/mnt/userdata/$(whoami)

source (=execute) the file by typing ". /etc/profile.d/bits.sh"

You now have shortcuts (env variables) that can be typed to reach the very long exercise locations as fololws:

$SUMMER leads to /usr/summer
$SOFT leads to $SUMMER/software
$REFS leads to $SUMMER/refs
$DATA leads to /home/<yourhome>/data

edgeR / DESeq2

Exercises
Slides

Archive

Session of January 20th and 27th, 2014 using Galaxy (Joachim Jacob)

Training 4: ChIP-Seq analysis

slides of the presentation

Introduction

The aim of this session is to :

Have an understanding of the nature of ChIP-Seq data
Perform a complete analysis workflow including QC, read mapping, visualization in a genome browser and peak-calling
Use the GenePattern platform for each step of the workflow and feel the complexity of the task
Have an overview of possible downstream analyses
Perform a motif analysis with online web programs

This training gives an introduction to ChIP-seq data analysis, covering the processing steps starting from the reads to the peaks. Among all possible downstream analyses, the practical aspect will focus on motif analyses. A particular emphasis will be put on deciding which downstream analyses to perform depending on the biological question. This training does not cover all methods available today. It does not aim at bringing users to a professional NGS analyst level but provides enough information to allow biologists understand what DNA sequencing practically is and to communicate with NGS experts for more in-depth needs.

For this training, we will use a dataset produced by Myers et al ^[1] involved in the regulation of gene expression under anaerobic conditions in bacteria. We will focus on one factor: FNR. The advantage of this dataset is its small size, allowing real time execution of all steps of the dataset.

Suggested Reading :

Bailey et al. Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data. PLoS Comput Biol 9, e1003326 (2013) ^[2].PDF
Thomas-Chollier et al. A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nature Protocols 7, 1551–1568 (2012)^[3]. PDF

raw Data :

all experiments: GEO: GSE41187
used subset: FNR IP ChIP-seq Anaerobic A (=> ENA/SRA: SRX189773 - SRR576933)
used control: anaerobic INPUT DNA (=> ENA/SRA: SRX189778 - SRR576938)

additional files:

zip file containing the E.coli K12 genome, the .bam and the .bai file for the ChIP sample
link to the E. coli gene annotations in gff3 format (download this gff3 file and use it in IGV)
zip file containing the .bam and .bai file for the control sample (download this file and use it in IGV)
normalized mapping results in BigWig format for the ChIP sample (download this file and use it in IGV)
normalized mapping results in BigWig format for the control sample (download this file and use it in IGV)

Exercises

Same training in command line instead of GenePattern

Links

From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis
GEO database^[4]
EBI ENA^[5]
Bowtie manual^[6]
MACS manual
RSAT (European mirror)^[7]
HMCan ^[8] when working with cancer samples or cell lines (by V Boera from Inst. Curie)
UCSC microbial genome browser
UCSC microbial genome tables

Archive

Session of June 1st, 2015 by Morgane Thomas-Chollier
Session of February 24th, 2014 by Morgane Thomas-Chollier

HowTo Pages related to this training

Training 5: metagenomics

Slides

Data files

Tools

Lotus pipeline
Download usearch version 8 and copy into /usr/bin/tools/ folder (you need to be superuser for this)
Make executable:
```
sudo chmod +x /usr/bin/tools/usearch8.1.1861_i86linux32
```
Create a symbolic link into the folder where Lotus will search for it:
```
sudo ln -s /usr/bin/tools/usearch8.1.1861_i86linux32 /usr/bin/tools/lotus_pipeline/bin/usearch_bin
```
You also need R with the vegan package installed

Exercises

References:

↑
Kevin S Myers, Huihuang Yan, Irene M Ong, Dongjun Chung, Kun Liang, Frances Tran, Sündüz Keleş, Robert Landick, Patricia J Kiley
Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding.
PLoS Genet: 2013, 9(6);e1003565
[PubMed:23818864] ##WORLDCAT## [DOI] (I p)
↑
Timothy Bailey, Pawel Krajewski, Istvan Ladunga, Celine Lefebvre, Qunhua Li, Tao Liu, Pedro Madrigal, Cenny Taslim, Jie Zhang
Practical guidelines for the comprehensive analysis of ChIP-seq data.
PLoS Comput Biol: 2013, 9(11);e1003326
[PubMed:24244136] ##WORLDCAT## [DOI] (I p)
↑
Morgane Thomas-Chollier, Elodie Darbo, Carl Herrmann, Matthieu Defrance, Denis Thieffry, Jacques van Helden
A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs.
Nat Protoc: 2012, 7(8);1551-68
[PubMed:22836136] ##WORLDCAT## [DOI] (I e)
↑ http://www.ncbi.nlm.nih.gov/geo/
↑ http://www.ebi.ac.uk/ena/
↑ http://bowtie-bio.sourceforge.net/
↑ http://rsat.eu
↑ http://www.cbrc.kaust.edu.sa/hmcan/

[ Main_Page ]

[1] 
Kevin S Myers, Huihuang Yan, Irene M Ong, Dongjun Chung, Kun Liang, Frances Tran, Sündüz Keleş, Robert Landick, Patricia J Kiley
Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding.
PLoS Genet: 2013, 9(6);e1003565
[PubMed:23818864] ##WORLDCAT## [DOI] (I p)

[2] 
Timothy Bailey, Pawel Krajewski, Istvan Ladunga, Celine Lefebvre, Qunhua Li, Tao Liu, Pedro Madrigal, Cenny Taslim, Jie Zhang
Practical guidelines for the comprehensive analysis of ChIP-seq data.
PLoS Comput Biol: 2013, 9(11);e1003326
[PubMed:24244136] ##WORLDCAT## [DOI] (I p)

[3] 
Morgane Thomas-Chollier, Elodie Darbo, Carl Herrmann, Matthieu Defrance, Denis Thieffry, Jacques van Helden
A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs.
Nat Protoc: 2012, 7(8);1551-68
[PubMed:22836136] ##WORLDCAT## [DOI] (I e)

[4] ttp://www.ncbi.nlm.nih.gov/geo/

[5] ttp://www.ebi.ac.uk/ena/

[6] ttp://bowtie-bio.sourceforge.net/

[7] ttp://rsat.eu

[8] ttp://www.cbrc.kaust.edu.sa/hmcan/

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

NGS data analysis

Contents

Training 1: Introduction to the analysis of NGS data

Slides

Exercises

Archive

FAQ

File formats

Training 2: NGS variant analysis

Training archive

Q&A pages

HowTo Pages related to this training

Training 3: RNA-Seq analysis

Bulk RNA-Seq analysis for differential expression

Tools

Slides

Exercises

Files

Extra links

Single cell RNA-Seq analysis

Tools

Slides

Exercises

Files

Extra links

Summer school 2018

Scenic

Experimental design

Integration of omics data

What after the summer school ?

Archive

creating ENV variables for the training

edgeR / DESeq2

Archive

Training 4: ChIP-Seq analysis

Introduction

Exercises

Links

Archive

HowTo Pages related to this training

Training 5: metagenomics

Slides

Data files

Tools

Exercises

Navigation menu

Search