Michael Bell

Ph.D. Students Blog

Skip to: Content | Sidebar | Footer

Annotation maturity: Comparison of annotations in new and old sets of UniProtKB entries.

3 February, 2012 (14:57) | Uncategorized | By: mj_bell

Carrying on from the previous post, we now wish to look at annotation maturity in sets of UniProtKB entries. We have seen that over time the quality of annotations appear to be decreasing over time, for both Swiss-Prot and TrEMBL. A reasonable explanation for this would be that annotations are constantly being added to the newly incorporated data, which in turn has added additional pressure on curators, meaning that over time the least effort has shifted from the reader to the annotator. Whilst we see a reduction in the overall annotation quality, we suspect that a mature set of entries would improve over age.

To approach this analysis, we compare annotations from entries that are in both Swiss-Prot Version 9 and UniProtKB/Swiss-Prot version 15. We also show the resulting alpha value for annotations from entries in UniProtKB/Swiss-Prot version 15 but not Swiss-Prot Version 9. The resulting graph for this is shown below:

Difference in Alpha value between SP9 and UPSP15

With the assumption that maturity is linked to age, we would expect that the quality of annotations within a set of old entries would improve over time. Interestingly, this doesn’t appear to be the case. Whilst the alpha value does decrease (by roughly 0.1), the alpha value for the remaining entries is significantly lower. This would suggest that annotations within the whole database are generally decreasing, although the rate of decrease depends on initial age of the entry. Given this, it is of interest to see how the quality of annotations in only new entries change over time. For this, we extract annotations for those entries that appeared for the first time in a given version. The resulting graph for this is shown below:

Alpha values for new annotations in new entries in various Swiss-Prot databases.

This graph also shows a steady decrease over time – a similar pattern to most of our previous analyses. These results are interesting; it would appear that annotation of new data is getting worse over time. It also appears to have a detrimental affect on other annotations. We have discussed the increase in data in relation to annotation quality, but the impact of size alone would not explain the decrease of annotation quality in older entries. One possible explanation for this is the protocol used for annotation curation.

The curation of annotations is clear process, consisting of 6 key steps. This process is detailed in (10.1093/database/bar009), with an overview of the process shown in the figure below (taken from (10.1093/database/bar009)).

Outline of the UniProtKB manual curation process, taken from paper: DOI:10.1093/database/bar009

Part of the protocol is to, for a given sequence, identify similar entries and then standardise and propagate annotation between these entries to ensure data consistency. Presumably, over time the curation process has undergone revisions (worthy of further investigation and another blog post!) due to changes in resources, increase of data, and so on. It is possible that this curation process is refined to deal with larger amounts of data and quicker release dates (both of which are true for Swiss-Prot over time – early versions of Swiss-Prot saw around 1000 new entries being added, with the later versions seeing around 30,000 new entries, whilst the release cycle is more frequent by a couple of months). Although the increase of manually curated entries and faster release dates could be due to more curators rather than change in annotation protocol (which will be investigated further), it is plausible that attempts to standardise annotation between similar entries is actually having a detrimental affect on overall annotation quality.



Pingback from Michael Bell » Annotation maturity: Average entry age within UniProtKB
Time February 3, 2012 at 2:59 pm

[...] Although viewing entry dates, rather than annotation quality, this conclusion fits with that of our other analyses; the rate of data is outstripping our ability to deal with it. Following on from this, we now wish to look at maturity of annotations within entry sets. This is discussed in the next blog post (link) [...]

Write a comment