CSC8309 -- Gene Expression and Proteomics
Mascot
Mass Spectrometry experiments do not directly produce protein identifications. The spectrum produced
has to be interpreted to deduce a protein ID. The most common way of achieving this is by
Peptide Mass Fingerprint (PMF). A PMF is the pattern of peptide masses produced by digestion (usually
by trypsin) of a protein. This fingerprint can be compared to a database of such fingerprints to
produce a protein identification. There a number of tools for searching PMF databases (see
ExPASy). We are going to use the most common tool, called
Mascot. This is a closed-source tool produced by a London-based company call Matrix Science.
Introduction
You are provided with a list of 17 International Protein Index accession
numbers (see below), and 3 files containing mass spectrometry peak lists for
peptide mass fingerprinting. These proteins were identified in a real,
published experiment, of which you are to work out the aim.
IPI00553164
IPI00025491
IPI00298497
IPI00021891
IPI00414676
IPI00220327
IPI00009867
IPI00299145
IPI00217963
IPI00450768
IPI00219217
IPI00027444
IPI00009342
IPI00011957
IPI00010303
IPI00550900
IPI00745872
Peak List 1
Peak List 2
Peak List 3
|
- Using MASCOT (www.matrixscience.com), identify the three proteins for which
you only have a peak list.
- Select the appropriate search tool (Hint)
- Leave all the parameters of the MASCOT search at the default, apart from
the 'Decoy' checkbox (check it), and the Database (search SwissProt, not
MSDB).
- Use the Data file upload option, and point to each of the peak list files
you just downloaded, in turn.
See the Screenshot for more info.
|
|
- What are the names of the three proteins?
- What are their UniProt accession numbers?
- What species does the experimental data come from? Could you use this
information after your first search to improve the results of the other
two searches?
- What is a MASCOT decoy database, and why is its use recommended in high
throughput experiments?
|
|
- IPI accession numbers can provide a means of translating between different
databases, since they attach multiple accession numbers to a single,
unified protein ID, beyond this though, they are, functionally, pretty useless. Use
the ID mapping service at UniProt to get UniProt accession numbers for the
other 17 proteins.
|
|
- What are the UniProt accession numbers you have found?
- Why are there so many of them?
|
|
- Create a text file containing the UniProt accession numbers you have found, one per line.
|