Michael Bell

Ph.D. Students Blog

UniProtKB and Benford’s Law

5 October, 2011 (14:36) | Annotation Quality, Miscellaneous | By: mj_bell

In the last blog post we looked at data parsing, and how Zipf’s law could possibly be used to detect parsing errors. Whilst reading a recent blog post by Ben Goldacre I was reminded of Benford’s law – which shares a number of similarities to Zipf’s – and considered how it may also be applicable […]

Have I parsed my data correctly?

22 September, 2011 (16:06) | Miscellaneous | By: mj_bell

The foundations for most of our work has been based on data parsed from text files. Extracted data has included single words and whole sentences from gigabytes of raw text. With such overwhelming amounts of data, how can we be confident that we have correctly parsed our data? Obviously some basic checking was performed. This […]

Word Clouds (Swiss-Prot and TrEMBL)

21 March, 2011 (16:37) | Annotation Quality, Miscellaneous, Website | By: mj_bell

During my analysis of Swiss-Prot and TrEMBL datasets I have extracted all the words from each version of each dataset and counted their occurrences. A neat way of looking at this data is to create word clouds. I have done this for all versions of Swiss-Prot and TrEMBL. These can be seen with common words […]

Turing Lecture with Prof Donald Knuth

11 February, 2011 (15:24) | Lecture, Miscellaneous, String Searching | By: mj_bell

Yesterday I attended the BCS and IET Turing Lecture in Glasgow. This year the lecture was given by Donald Knuth. Saying it was a lecture was slightly mis-leading, as it was basically a question and answer session. This was part 4 of the lectures, having also given the lecture at Manchester, Cardiff and London. I’d […]