This section of the tutorial is split into two parts. The first part will show you how to perform multiple alignments of DNA and protein sequences using ClustalW and how to format the output for inclusion into a Word document. This process allows you to identify the regions of conserved sequence by eye.
The second section will describe how to search for patterns and motifs in a protein sequence, comparing them to motif entries in a database for functional analysis purposes.
In this section, we will perform a multiple sequence alignment using the ClustalW program, which is probably the best known multiple sequence alignment program.
|
When aligning protein sequences it is often apparent that certain regions or specific amino acids, are more conserved than others. Such conserved regions are often conserved because they encode a part of the protein that is functionally important. The term motif is use to refer to a part of a protein sequence that is associated with a particular biological function.
For example a region of a protein that binds ATP is characterised by an ATP binding motif in the protein sequence. Since these regions are conserved, they may be recognisable by the presence of a particular sequence of amino acids called a pattern. A pattern is thus a qualitative description of a motif in terms of amino acid sequence.
The concept of a profile extends this concept, allowing a quantitative description of a motif, by assigning probabilities to the occurrence of a particular amino acid at each position of a motif. Thus profiles can be used to describe very divergent motifs.
The presence of a particular motif within a protein sequence can be used to suggest functions for uncharacterised proteins.
A number of databases have been constructed that attempt to describe particular protein motifs in terms of patterns and profiles. They allow you to search for patterns or profiles that are indicative of particular functional motifs within a query protein.
Some examples of such databases include:
These databases all have different areas of optimum application - its difficult to tell which one will give the best results. They all have particular strengths and weaknesses. You really need to use them all.
However, a database called INTERPRO has been recently established that combines information from PRINTS, PROSITE, ProDom and Pfam (click here for the reference describing InterPro). Using InterPro saves a lot of work since we can essentially search many databases in one go.
The following exercise will guide you through the use of InterPro to look for motifs in some example protein sequences.
In this exercise we will analyse some example protein sequences for motifs as an exercise in getting the hang of using InterProScan.
These sequences are mystery sequences. We will use interpro to see if you can assign a putative function to them based on the motifs that you find. Use the first sequence first. After this, pick some of the others. Bring up the InterProScan web interface at EBI in a new window. Paste your sequence in to the text box labelled "Enter or cut and paste protein sequence here". You may want to add your email address to the relevant box, to get information sent back to you. When you are ready, click on the "Run" box. This can take quite a long time, so be prepared to wait. |
You will be presented with the results in a tabular form. If you don't know what they mean then try the tutorial which explains things.
Although many biologists still use BLAST as their first port-of-call, Interpro is probably a better place; it most cases, it gives a more definitive answer about the function of a protein, with less effort. |