High-throughput sequencers, also called 'next-generation' ('next-gen' or 'ngs'), or sometimes 'second-generation' (as opposed to third generation) sequencers are technologies that deliver 10⁵ to several 10⁶ of DNA reads, covering millions of bases. It is being used to (re)sequence genomes, determine the DNA-binding sites of proteins (ChIP-seq), sequence transcriptomes (RNA-seq).
These technologies bring analysis of sequence information to another level. Rethinking experiments is crucial.
- 1 Manufacturers and technologies
- 2 Databases of reads
- 3 Data formats
- 4 Standard Analysis Workflow of HTS DNA reads
- 5 HTS data analysis packages
- 6 Visualisation
- 7 Purposes of HT sequencing
- 8 Other useful information sources
Manufacturers and technologies
- Solexa/Illumina - 1-3 Gigabase (Gb) reads of 36 or 150 bp
- video - click on Technology
- Roche/454 - 0.1 Gb reads of 400-700 bp
- ABI/SOLiD - 2-3 Gb reads of 35-75 bp
- Helicos - 8 Gb read of 25-45 bp
- Complete Genomics - Genome sequencing as a service, a partner of VIB to sequence complete genomes of humans (check TechWatch) (see Complete Genomics BITS wiki page)
- Ion Torrent
Databases of reads
- NCBI Short Read Archive, (See Short Read Archive Overview for more information)
- European Nucleotide Archive - Reads
- DDBJ Trace/Short Read Archive, Submissions, Data Release
When you have millions of reads, you want to get rid of the reads as soon as possible, since a read on its own does not contain relevant information. Merging overlapping reads (assembling) can lead already to a large reduction of data size. If you have reference genome available, you can align the reads to the reference genome (mapping) and store the positions, the counts and the sequence deviations to that reference genome.
- MAQ .map format (a compressed binary file specifically designed for short read alignment)
- AMOS A Modular Open-Source Assembler, assembly format used by velvet
- SRF Sequence Read Format (also called Short Read Format), solid2srf, illumina2srf
- MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
- FASTQ format is a common format for short reads with quality scores. It is supported in EMBOSS 6.1.0 as a sequence format. Quality scores are also used if the format is more explicitly named in EMBOSS: fastqsanger or fastqillumina
Standard Analysis Workflow of HTS DNA reads
Depending on the sample, reads may be assembled before being mapped to a reference genome (if there is any). Assembly will merge overlapping sequence into one sequence. Assembly is a very computationally demanding task.
- MIRA - also part of the EMBASSY package.
High-throughput sequencing data contains a fair amount of errors. To discern the sequencing errors from genuine sequencing different a sufficient sequencing depth is needed together with a good quality assessment of the reads
See also this NBIC wiki page
HTS data analysis packages
- SAMtools, the program package distributed with the SAM format (Win,Linux,MacOS)
- BioConductor (R) packages for HTS data analysis
Using R packages
Manuals for HT Sequence Analysis with R and Bioconductor
- ShortRead - Quality assesment of the reads, finding duplicates, trimming, string pattern searches 
- Biostrings - Reading sequences in R 
- BSGenomes - Reading in complete genomes and BioC annotation data 
- DEGSeq - Identify differentially expressed genes from RNA-Seq data. 
- IRanges - infrastructure for positional data.
- biomaRt - interface to BioMart annotations.
- rtracklayer - interface to online and other genome browsers.
- chipseq & ChIPpeakAnno - Chip-Seq analysis.
Stand-alone viewing tools for high-throughput sequencing data
- Tablet, SCRI
- Integrative Genome Viewer, very powerful JAVA based viewer (Win, Mac, Linux)
- EagleView, the Marth Lab
- AnnoJ genome browser Visualising deep sequencing data and other genome annotation data.
- MagicViewer, large-scale short reads can be displayed in a zoomable interface under user-defined color scheme through an operating system-independent manner
- MapView - Visualising short reads alignments on a desktop computer
- JGI Genome Browser - Visual tool for viewing assembled genomes.
- Lightweight Genome Viewer - A small genome viewer
Purposes of HT sequencing
Also called RNA-seq
Sometimes referred to as "SNP-seq"
- SNPExpress, a database enabling researchers to input a SNP, gene, or a genomic region to investigate regions of interest for localized effects of SNPs on exon and gene level expression changes.
- PolyBayes - SNP discovery from MarthLab
Copy Number Variation
- ChIP-seq - DNA-protein interaction
- BS-seq - bisulfite treatment and sequencing
Small RNA profiling
mRNA expression profiling
Digital gene expression (DGE)