Predicting biological properties based on protein sequences

From BITS wiki
Jump to: navigation, search
Go to parent Basic bioinformatics concepts, databases and tools#Exercises_during_the_training

Predicting TM helices

Go to the UniProt record of a short fly protein of unknown function.

We will identify TM regions in the sequence using Phobius.

The tool contains two HMMs, one for TM helices and one for signal peptides. TM regions and signal peptide have very similar amino acid compositions and TM proteins often have a signal peptide to target them to the membrane. Therefore, it is difficult to distinguish between signal peptides and TM helices. This is the nice thing about Phobius: it combines signal peptide and TM predictions. When you use other tools for TM prediction you have to combine them with signal peptide prediction tools like SignalP. If SignalP -and preferably a few other signal peptide prediction tools- agree on predicting a signal peptide you should remove the signal peptide sequence from the full protein sequence since it is not part of the mature protein.

The TM HMM was based on a data set of proteins with experimentally determined topology.

Phobius is making a binary prediction on the existence of a TM region in the fly sequence. Remember that it is advised to use multiple prediction tools and compare their results. Therefore, we will repeat the prediction with MEMSAT, the TM prediction tool of PSIPRED as an alternative TM prediction tool.

Psipred will run both MEMSAT and MEMSAT-SVM, the former a neural network based and the latter an SVM based TM helix prediction tool.

So you see that the predictions of the different tools agree. In this case you can be more or less sure of the prediction.