There are quite a few sites offering BLAST services scattered across the Internet. There are also different implementations of the BLAST algorithm (e.g. NCBI BLAST2 and WU-BLAST).
The Usage of these sites varies a bit between them. However the basic interpretation of the results is the same and is usually the main source of confusion or problems for most people.
We will use the the NCBI BLAST server. The server is perhaps the most user friendly, has lots of colours and added extras. The downside is that is can be very slow, especially in the afternoon. This exercise has two parts. Firstly, we will work through some BLAST tutorials. Then we will analyse some known and unknown sequences.
As there are already a number of excellent BLAST tutorials on the web, I'll use a couple of these for this practical.
Work through the BLAST tutorial here. There is also a more extensive BLAST guide which you may wish to look at. For the moment, you want to stick to BLAST rather than PSI-BLAST. |
Let's try a number of different searches. First, lets try a nucleotide search with the PAX6 DNA sequence.
Find a DNA sequence for the PAX6 gene. Perform a nucleotide-nucleotide
(blastn ) search again the nr database.
|
You should see that most of the matches are at the 3' end of the DNA sequence (view). Take a look at some of the alignments — most of them are very good (view). This is characterisitic of a BLAST search these days; it's quite likely (although far from a given) that you will gets some pretty exact matches.
BLAST results produce a lot of data — this leads some people to think that they should be raw BLAST results into their coursework and/or dissertations. This is a bad idea — BLAST results need to be interpreted, analysed and, critically, summarised. |
In general, you want to search with protein if you can, rather than DNA. It tends to lead to better results.
Here is a mystery protein sequence that has been isolated by sequencing a clone from a library in your lab. Use BLAST with the non-redundant database to see if you can find it's identity. MRKMSSEEFYLFKNISSVGPWDGPQYHIAPVWAFYLQAAFMGTVFL IFFPLNAMVLVATLRKLRQPLNYILVNVSFGGFLLCIFSVFPVFVA SCNGYFVFGRHYCALEGFLGTVAGLVTWWSLAFLAFERYIVICKPF GNFRFSSKHALTVVLATWTIGIGVSIPPFFGWWWFIPEGSCGPDDW YTVGTKYRSESYTWFLFIFCFIVPLSLICFSYTQLLRALKARIRSF FPYFLICQGYLQLYLSLQFQACIMKMVCGKAMTDESDTCSSVKTEV STVSSTQVGPN |
With a protein search, the NCBI search is clever enough to detect that there is probably a highly conserved part of the sequence before it runs the BLAST. |
Finally, let's try this sequence
MSPKQEEYEVERIVDEKLDRNGAVKLYRIRWLNYSSRSDTWEPPENLSGCSAVLAEWKRR KRRLKGSNSDSDSPHHASNPHPNSRQKHQHQTSKSVPRSQRFSRELNVKKENKKVFSSQT TKRQSRKQSTALTTNDTSIILDDSLHTNSKKLGKTRNEVKEESQKRELVSNSIKEATSPK TSSILTKPRNPSKLDSYTHLSFYEKRELFRKKLREIEGPEVTLVNEVDDEPCPSLDFQFI SQYRLTQGVIPPDPNFQSGCNCSSLGGCDLNNPSRCECLDDLDEPTHFAYDAQGRVRADT GAVIYECNSFCSCSMECPNRVVQRGRTLPLEIFKTKEKGWGVRSLRFAPAGTFITCYLGE VITSAEAAKRDKNYDDDGITYLFDLDMFDDASEYTVDAQNYGDVSRFFNHSCSPNIAIYS AVRNHGFRTIYDLAFFAIKDIQPLEELTFDYAGAKDFSPVQSQKSQQNRISKLRRQCKCG SANCRGWLFG |
I've chosen this sequence for a reason 1. This protein has a slightly more interesting similarity pattern. In this case, you should see that while a few proteins show similarity down the entire length of the protein, most show C and N terminal similarity but little to the middle of the protein.
1. Two reasons actually — no prizes for finding out the other.