Top: Index  Up: BLAST and company Next: BLAST Variants

CSC8312 -- Bioinformatics Theory and Applications

Using BLAST

There are quite a few sites offering BLAST services scattered across the Internet. There are also different implementations of the BLAST algorithm (e.g. NCBI BLAST2 and WU-BLAST).

The Usage of these sites varies a bit between them. However the basic interpretation of the results is the same and is usually the main source of confusion or problems for most people.

We will use the the NCBI BLAST server. The server is perhaps the most user friendly, has lots of colours and added extras. The downside is that is can be very slow, especially in the afternoon. This exercise has two parts. Firstly, we will work through some BLAST tutorials. Then we will analyse some known and unknown sequences.

BLAST Tutorial

As there are already a number of excellent BLAST tutorials on the web, I'll use a couple of these for this practical.

act Work through the BLAST tutorial here. There is also a more extensive BLAST guide which you may wish to look at. For the moment, you want to stick to BLAST rather than PSI-BLAST.

BLASTing some sequences

Let's try a number of different searches. First, lets try a nucleotide search with the PAX6 DNA sequence.

act Find a DNA sequence for the PAX6 gene. Perform a nucleotide-nucleotide (blastn) search again the nr database.

You should see that most of the matches are at the 3' end of the DNA sequence (view). Take a look at some of the alignments — most of them are very good (view). This is characterisitic of a BLAST search these days; it's quite likely (although far from a given) that you will gets some pretty exact matches.

exclaim BLAST results produce a lot of data — this leads some people to think that they should be raw BLAST results into their coursework and/or dissertations. This is a bad idea — BLAST results need to be interpreted, analysed and, critically, summarised.

In general, you want to search with protein if you can, rather than DNA. It tends to lead to better results.

act

Here is a mystery protein sequence that has been isolated by sequencing a clone from a library in your lab. Use BLAST with the non-redundant database to see if you can find it's identity.

MRKMSSEEFYLFKNISSVGPWDGPQYHIAPVWAFYLQAAFMGTVFL
IFFPLNAMVLVATLRKLRQPLNYILVNVSFGGFLLCIFSVFPVFVA
SCNGYFVFGRHYCALEGFLGTVAGLVTWWSLAFLAFERYIVICKPF
GNFRFSSKHALTVVLATWTIGIGVSIPPFFGWWWFIPEGSCGPDDW
YTVGTKYRSESYTWFLFIFCFIVPLSLICFSYTQLLRALKARIRSF
FPYFLICQGYLQLYLSLQFQACIMKMVCGKAMTDESDTCSSVKTEV
STVSSTQVGPN
exclaim With a protein search, the NCBI search is clever enough to detect that there is probably a highly conserved part of the sequence before it runs the BLAST.
act Finally, let's try this sequence
MSPKQEEYEVERIVDEKLDRNGAVKLYRIRWLNYSSRSDTWEPPENLSGCSAVLAEWKRR
KRRLKGSNSDSDSPHHASNPHPNSRQKHQHQTSKSVPRSQRFSRELNVKKENKKVFSSQT
TKRQSRKQSTALTTNDTSIILDDSLHTNSKKLGKTRNEVKEESQKRELVSNSIKEATSPK
TSSILTKPRNPSKLDSYTHLSFYEKRELFRKKLREIEGPEVTLVNEVDDEPCPSLDFQFI
SQYRLTQGVIPPDPNFQSGCNCSSLGGCDLNNPSRCECLDDLDEPTHFAYDAQGRVRADT
GAVIYECNSFCSCSMECPNRVVQRGRTLPLEIFKTKEKGWGVRSLRFAPAGTFITCYLGE
VITSAEAAKRDKNYDDDGITYLFDLDMFDDASEYTVDAQNYGDVSRFFNHSCSPNIAIYS
AVRNHGFRTIYDLAFFAIKDIQPLEELTFDYAGAKDFSPVQSQKSQQNRISKLRRQCKCG
SANCRGWLFG

I've chosen this sequence for a reason 1. This protein has a slightly more interesting similarity pattern. In this case, you should see that while a few proteins show similarity down the entire length of the protein, most show C and N terminal similarity but little to the middle of the protein.


1. Two reasons actually — no prizes for finding out the other.


Top: Index  Up: BLAST and company Next: BLAST Variants