Protein motifs, patterns and profiles.


When aligning protein sequences it is often apparent that certain regions or specific amino acids, are more conserved than others. Such conserved regions are often conserved because they encode a part of the protein that is functionally important. The term motif is use to refer to a part of a protein sequence that is associated with a particular biological function.

For example a region of a protein that binds ATP is called an ATP binding motif.  Since these regions are conserved, they may be recognisable by the presence of a particular sequence of amino acids called a pattern. A pattern is thus a qualitative description of a motif in terms of amino acid sequence.

The concept of a profile extends this concept, allowing a quantitative description of a motif, by assigning probabilities to the occurrence of a particular amino acid at each position of a motif. Thus profiles can be used to describe very divergent motifs.

The presence of a particular motif within a protein sequence can be used to suggest functions for uncharacterised proteins.

A number of databases have been constructed that attempt to describe particular protein motifs in terms of patterns and profiles. They allow you to search for patterns or profiles that are indicative of particular functional motifs within a query protein.

Some examples of such databases include:

            PROSITE - a collection of patterns and profiles

            Pfam - A collection of Profiles generated using hidden Markov models

            PRINTS - provider of fingerprints (groups of aligned, un-weighted motifs)

            BLOCKS - a database of weighted profiles or blocks

These databases all have different areas of optimum application – its difficult to tell which one will give the best results. They all have particular strengths and weaknesses. You really need to use them all.

However, a database called INTERPRO has been recently established that combines information from PRINTS, PROSITE, ProDom and Pfam (click here for the reference describing InterPro). Using InterPro saves a lot of work since we can essentially search many databases in one go.

The following exercise will guide you through the use of InterPro to look for motifs in some example protein sequences.


On to the exercise


Back to section 3