Transcription-factor binding site discovery

From BITS wiki
Jump to: navigation, search

[ Overviews | Main_Page ]


Detection of transcription-factor binding sites tries to answer different questions:

  • given a transcription factor which recognizes a motif, which promoter sequences contain that motif?
  • given my promoter sequence, which transcription-factor binding motifs are present?
  • given a list of genes, which motifs are common and which are (potential) transcription-factor binding sites?

Below you can find tools to answers these questions. A separate part contains exercises, in which you can follow an analysis of one single gene to extract the promoter region, analyse it for TFB sites and retrieve genes from the same of other species with the same TFB sites. Further, examples for analysing a list of genes (got from different analyses, such as pathway analysis, gene expression analysis, etc.) to learn ways to detect common TFB sites from these genes, is given.



Installation Amadeus

During the training session we will make use of the Allegro/Amadeus software package. Go to and click on AmadeusPBM v1.0. A registration and license form appears, which you should fill in completely and agree upon. The file should start downloading.

Unpack the zip file to a default location, and start the program by double-clicking 'run.bat' on windows, or running 'java -jar Amadeus_v1.1.jar' on Linux.


Part 1: How do I get from a list of genes to potential cis-regulatory elements?

Exercise 1.1

Use the webversion of MEME to detect three elements for datasets list1 and list2. Describe the results. Are there differences? Save the resulting elements for list2 as consensus/regular expression.

Exercise 1.2:

Use list2 in Weeder. [This link is dead. You can try to install Weeder on your own machine to proceed. Jjacob (talk)] Compare the resulting elements to the ones found with MEME. Are they the same or do you see differences? Save the resulting elements.

Exercise 1.3:

Use the windows platform Amadeus with list2 and compare the results to the previous two exercises. Select “Other” from the species menu and load the file masked_promoters.all as sequences file. Use file list2.list as targetset. Click add and start Amadeus. Save the first two resulting elements as consensus.

Exercise 1. 4:

Use the STAMP website ( to compare the results for list2 of all three programs used so far. The websites accepts a variety of formats for input elements. Use the fasta format to submit the elements found in the previous exercises.

To transform the degenerated positions in the detected elements use the IUPAC definitions:

M = A or C
R = A or G
W = A or T
S = C or G
Y = C or T
K = G or T
V = not T 
H = not G
D = not C
B = not A
N = A or C or G or T

You can choose a variety of databases of known transcription factor binding sites to compare your results to. Use the AGRIS database for this exercise.

Part 2: I know how my element looks like. How can I look up if my genes have it? Or which genes have it too?

Exercise 2.1:

Use the RSA-tool dna-pattern (link in list on top) to identify the positions of the elements CCACGTGG, AAATATCT, TCTCTCTCT and TAGC in list 3. What do you notice?

Use the button on the end of the page to forward the results to the visualization tool feature map. And use it to create a picture of your results. Do you see specific pattern? Use the provided option to view the separate elements. How would you explain the results?

Exercise 2.2:

List 4 is a subset of list2 only including genes with one of the elements detected in exercise 1. Use the elements found in exercise 1 and the 3 given elements (CCACGTGG, AAATATCT, TCTCTCTCT) with dna-pattern and feature map on list 4. For which element were the genes selected?

Note: RSA-tools also have a genome wide dna-pattern. Depending on the genome the results can take a while to be computed.

Part 3: Comparative analysis

Are the binding sites also present in other members of my gene family of interest?

Exercise 3.1:

Use RSA-tools to find the following elements in gene family 1.


Visualize the results. If necessary use the filter function. How reliable do you think the results are? How do you estimate the reliability based on the following frequencies in the genome?






14% 10% 8% 90 %


12% 11% 10% 91 %


10% 8% 4 % 88%


13% 9% 5 % 97 %

I have a family. What are the conserved elements?

Exercise 3.2:

Use MEME and WeederH to detect elements in the given family. Use yeast as organism for WeederH.

Exercise 3.3:

The given family includes two genes from species A. Create fasta files for two smaller families, each containing one of the original genes and all other species. Use one of the programs on both smaller families. Is there a difference? What could this suggest?

Exercise 3.4:

Compare all detected elements (in 3.2 and 3.3) with the given four elements (3.1). Use dna-pattern and feature-map to visualize the results on the whole family.

Part 4: summary

Exercise 4.1:

Use one of the programs to detect elements for gene list list4.

Exercise 4.2:

Look for conserved elements in family F2 using one of the websites mentioned above.

Exercise 4.3:

Compare the elements detected in 4.1 with 4.2. in family F2. Are the elements from 4.1 also conserved?

Exercise 4.4:

Check the conserved elements in gene list list4.