Ensembl

From BITS wiki
Jump to: navigation, search
Go to parent Basic bioinformatics concepts, databases and tools#Exercises_during_the_training

The Ensembl Genome browser

Ensembl is a joint project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute that annotates chordate genomes. Gene sets from model organisms such as yeast and fly are also imported for comparative analysis by the Ensembl ‘compara’ team. Most annotation is updated every two months, leading to increasing Ensembl versions.

Go to Ensembl.

Exercise 1: human F9

Search the human F9 (Coagulation factor IX Precursor) gene.

  • Select the Human genome to search in
  • Search for F9
  • Click Go

Click the F9 (Human gene) link to go to the gene page of F9.

Show the transcript table if this is not yet the case.

To download sequences choose Export data via the menu on the left.

Click on the Transcript ID of transcript F9-201 to go to the Transcript page.

The transcript page also has a menu on the left, with similar looks bit different content.

You can use Configure this page to change the graphical properties of the information that is shown.

The Ensembl genome browser strives to display many layers of genome annotation into a simplified view. Click the Location tab to go to the genome browser page.

The location page has three views of the gene and its genomic surroundings, the one at the bottom being the most detailed view.

Each type of information on the figures is displayed in a track. In the Genes track in the bottom view you can see the structure of the three F9 transcripts. The one coloured in gold is a golden transcript, it means that both Ensembl and HAVANA have predicted this transcript. In the transcripts you can see the location of:

  • codings exons: filled boxes
  • introns: lines
  • UTRs: empty boxes
Transcripts coloured in red reflect protein coding transcripts predicted by Ensembl or HAVANA but not by both. Transcripts coloured in blue represent transcripts without a CDS.

In the most detailed view you can click all components to show extra info and links.

The CCDS set track shows the CCDS for this gene. Zoom in on the sixth exon:

  • draw a red box around the exon with your mouse
  • select Jump to region


Ensembl39.png

You can add tracks to the figure by clicking Configure this page in the left menu.

At the bottom of the left panel of the Configure this page window you can save and load track configurations. If you want to go back to the default track settings you can click Reset configuration. You can delete tracks by clicking in the figure on the name of the track and then clicking the X.

Ensembl42.png

Go back to the Gene page. By default, a gene summary is shown on the Gene page but there is a lot more info available via the left menu.

You also see links to UniProt, NCBI's Gene (EntrezGene) and UniGene database...

Ensembl Genomes

The Ensembl Genomes database contains the sequences and annotation of organisms that are not covered by Ensembl, such as bacteria, plants, fungi and more. When you go to the Ensembl genomes website you can select the taxon that you're intersted in.

EG1.png


Exercise 2: Ensembl Plants

Click the Plants taxon link to the EnsemblPlants database. The EnsemblPlants home page contains an overview of all supported plants with links to their corresponding genome pages. The genome pages show general info on the genome, e.g. ploidy, length of the sequence, assembly details, you can download sequence and annotation...

We now go the Arabidopsis homepage.
Recently, our colleagues from PSB unraveled a molecular switch that controls stress response in Arabidopsis, hence opening the door for breeding plants with larger tolerance for stress. They found that transcription factor MYB29 is a negative regulator of mitochondrial stress response. Deletion of the gene leads to increased sensitivity to light and drought stress.
The user interface of EnsemblPlants is identical to the user interface in Ensembl so it should be easy to search the gene in EnsemblPlants.

Ensembl Plants Gene pages have a left menu that is identical to these in Ensembl.

Of course, this knowledge is only useful when we can extrapolate it to crops. Arabidopsis is just a weed, we don't eat it so improving stress tolerance in Arabidopsis is not what we want in the long run.

If you want to visualize the actual alignments you can click Genomic alignments in the section Plant compara.

Export is done in exactly the same way as in Ensembl.

And visualizations on the location page are identical to these in Ensembl.

If the plant that you work on is not represented in Ensembl Genomes, you might consider taking a look in these alternative plant sequence databases: