Reference genomes is a consensus sequence derived from (mostly) a lot of individuals of the same species. They are the effort of usually big consortia. To achieve this, usually all kinds of types of sequencing data (BAC, YAC, Cosmids, ESTs, RefSeq genes, NGS data ...) are being gathered and merged together. A certain reference genome is also referred to by 'assembly' or 'build'. Note: a reference genome sequence is -as a rule- always represented as a haplotype.
- Many reference genomes on NCBI: ftp://ftp.ncbi.nih.gov/genomes/
Indexing reference genomes for NGS data analysis
- Indexing genomes for bowtie
- Indexing genomes for bwa
- Indexing genomes for samtools
- Indexing genomes for srma
Personal genomics require the sequencing of a (part of) an individual's genome. Assembly of this sequencing data can be assisted by the reference genome of the species. Once done, information as to where the individual's genome differ from the reference can be extracted. These differences, consisting of SNPs, structural differences (i.e. large gaps, transposed large parts, inversions,...) ,..., may reveal information to why the phenotype of the individual differs from the average.