We've now navigated from the original DNA sequence, through the genome to the protein. We've seen that some of the information that is available to use is very organised; for example, there are very formal ways of describing the chromosomal location. Other is more abstract; the CC line from Uniprot is famous for it's unfortunate meaning; these "comment" lines contain most of the biology.
Two questions remain: first, how can you check on the information that is being given to you? At the bottom of the Uniprot record you should be able to find the references section. Originally, almost all the knowledge in Uniprot came from primary research papers like this; nowadays most of the sequence information comes from large sequencing projects.
Follow the links back to PubMED for several of the papers. |
The problem with papers is that you have to read them. This can be tiresome for a biologis; the problem is much worse for a computer. One solution to this is to use Ontologies. Find this section of the record.
Follow the links for the Gene Ontology |
The terms here contain less knowledge than the papers, but it's also much more accessible to computers. For example, you can look at several proteins at once and ask what function they share in common.
Can you find an example of a tool that performs statisical analysis over the Gene Ontology |