Annotation maturity: Comparison of annotations in new and old sets of UniProtKB entries.
Carrying on from the previous post, we now wish to look at annotation maturity in sets of UniProtKB entries. We have seen that over time the quality of annotations appear to be decreasing over time, for both Swiss-Prot and TrEMBL. A reasonable explanation for this would be that annotations are constantly being added to the newly incorporated data, which in turn has added additional pressure on curators, meaning that over time the least effort has shifted from the reader to the annotator. Whilst we see a reduction in the overall annotation quality, we suspect that a mature set of entries would improve over age.
To approach this analysis, we compare annotations from entries that are in both Swiss-Prot Version 9 and UniProtKB/Swiss-Prot version 15. We also show the resulting alpha value for annotations from entries in UniProtKB/Swiss-Prot version 15 but not Swiss-Prot Version 9. The resulting graph for this is shown below:
With the assumption that maturity is linked to age, we would expect that the quality of annotations within a set of old entries would improve over time. Interestingly, this doesn’t appear to be the case. Whilst the alpha value does decrease (by roughly 0.1), the alpha value for the remaining entries is significantly lower. This would suggest that annotations within the whole database are generally decreasing, although the rate of decrease depends on initial age of the entry. Given this, it is of interest to see how the quality of annotations in only new entries change over time. For this, we extract annotations for those entries that appeared for the first time in a given version. The resulting graph for this is shown below:
This graph also shows a steady decrease over time – a similar pattern to most of our previous analyses. These results are interesting; it would appear that annotation of new data is getting worse over time. It also appears to have a detrimental affect on other annotations. We have discussed the increase in data in relation to annotation quality, but the impact of size alone would not explain the decrease of annotation quality in older entries. One possible explanation for this is the protocol used for annotation curation.
The curation of annotations is clear process, consisting of 6 key steps. This process is detailed in (http://dx.doi.org/10.1093/database/bar009), with an overview of the process shown in the figure below (taken from (http://dx.doi.org/10.1093/database/bar009)).
Part of the protocol is to, for a given sequence, identify similar entries and then standardise and propagate annotation between these entries to ensure data consistency. Presumably, over time the curation process has undergone revisions (worthy of further investigation and another blog post!) due to changes in resources, increase of data, and so on. It is possible that this curation process is refined to deal with larger amounts of data and quicker release dates (both of which are true for Swiss-Prot over time – early versions of Swiss-Prot saw around 1000 new entries being added, with the later versions seeing around 30,000 new entries, whilst the release cycle is more frequent by a couple of months). Although the increase of manually curated entries and faster release dates could be due to more curators rather than change in annotation protocol (which will be investigated further), it is plausible that attempts to standardise annotation between similar entries is actually having a detrimental affect on overall annotation quality.