Top: Index Previous: Introduction Up: Proteomics Practical Next: A Cytoscape Tutorial

CSC8309 -- Gene Expression and Proteomics

Mascot

Mass Spectrometry experiments do not directly produce protein identifications. The spectrum produced has to be interpreted to deduce a protein ID. The most common way of achieving this is by Peptide Mass Fingerprint (PMF). A PMF is the pattern of peptide masses produced by digestion (usually by trypsin) of a protein. This fingerprint can be compared to a database of such fingerprints to produce a protein identification. There a number of tools for searching PMF databases (see ExPASy). We are going to use the most common tool, called Mascot. This is a closed-source tool produced by a London-based company call Matrix Science.

Introduction

You are provided with a list of 17 International Protein Index accession numbers (see below), and 3 files containing mass spectrometry peak lists for peptide mass fingerprinting. These proteins were identified in a real, published experiment, of which you are to work out the aim.

IPI00553164
IPI00025491
IPI00298497
IPI00021891
IPI00414676
IPI00220327
IPI00009867
IPI00299145
IPI00217963
IPI00450768
IPI00219217
IPI00027444
IPI00009342
IPI00011957
IPI00010303
IPI00550900
IPI00745872

Peak List 1 Peak List 2 Peak List 3

act
  • Using MASCOT (www.matrixscience.com), identify the three proteins for which you only have a peak list.
  • Select the appropriate search tool (Hint)
  • Leave all the parameters of the MASCOT search at the default, apart from the 'Decoy' checkbox (check it), and the Database (search SwissProt, not MSDB).
  • Use the Data file upload option, and point to each of the peak list files you just downloaded, in turn.

See the Screenshot for more info.

quest
  1. What are the names of the three proteins?
  2. What are their UniProt accession numbers?
  3. What species does the experimental data come from? Could you use this information after your first search to improve the results of the other two searches?
  4. What is a MASCOT decoy database, and why is its use recommended in high throughput experiments?
act
  • IPI accession numbers can provide a means of translating between different databases, since they attach multiple accession numbers to a single, unified protein ID, beyond this though, they are, functionally, pretty useless. Use the ID mapping service at UniProt to get UniProt accession numbers for the other 17 proteins.
quest
  1. What are the UniProt accession numbers you have found?
  2. Why are there so many of them?
act
  • Create a text file containing the UniProt accession numbers you have found, one per line.

Top: Index Previous: Introduction Up: Proteomics Practical Next: A Cytoscape Tutorial