Michael Bell

Ph.D. Students Blog

Skip to: Content | Sidebar | Footer

Category: Sentence reuse

Most frequently occurring sentences in UniProtKB and their propagation through the Web

4 November, 2011 (16:16) | Sentence reuse, Uncategorized | By: mj_bell

As already established, sentence reuse is common within UniProtKB. Obviously, some sentences will have higher reuse than others, giving an indication as to their information content. Below we show the top 10 sentences for the first versions of Swiss-Prot and TrEMBL, and also for Version 15 of UniProtKB/TrEMBL and UniProtKB/Swiss-Prot. Top 10 sentences for Swiss-Prot […]

Anaylsing selenocysteine sentence reuse and flow.

31 August, 2011 (11:17) | Annotation Quality, Sentence reuse, Website | By: mj_bell

The sentence “Selenocysteine is encoded by the opal codon.” was previously discussed, having been identified through a network view of sentence usage within Swiss-Prot version 9. Our interest was mainly due to it being present in two entries in different interconnected groups. In that blog entry we looked at the similarities (or lack of) between […]

Producing web-based dynamic graphs

5 August, 2011 (13:05) | Sentence reuse, Uncategorized, Website | By: mj_bell

As part of my work on sentence reuse I have been investigating ways to visualise various sets of data on my website. An obvious requirement of this is that the graphs must be developed dynamically; the resulting graph depending on a users query. We also have to account for various types of data, not just […]

Levels of sentence reuse in UniProtKB

26 July, 2011 (12:35) | Sentence reuse, Uncategorized | By: mj_bell

As is frequently highlighted, data being added to biological databases is ever increasing; typically at an exponential rate. This is true for the number of entries added over time to both UniProtKB/Swiss-Prot and UniProtKB/TrEMBL, as illustrated below:   UniProtKB offer a number of detailed statistics for each release of Swiss-Prot and TrEMBL, including the total […]

Selenocysteine is encoded by the opal codon.

6 May, 2011 (13:17) | Networks, Sentence reuse | By: mj_bell

As mentioned previously, sentence reuse is common in both manual and automated annotation curation. Although our analysis is in its early stages, we have already noticed that, in most cases, sentences are only shared between clusters of proteins. In Swiss-Prot version 9 we noticed one sentence, “THE ACTIVE-SITE SELENOCYSTEINE IS ENCODED BY THE OPAL CODON, […]

UniProtKB and Sentence Reuse

25 April, 2011 (12:26) | Annotation Quality, Sentence reuse | By: mj_bell

Previously we have looked at the occurrences of words within UniProtKB and seen varying degrees of reuse. What about whole sentences? When an annotation is created, either manually or automatically, annotations are frequently reused (i.e. sentences copied and pasted from one entry to another), as detailed in the curation protocol . By investigating this reuse, […]