The Gene Ontology (GO) is fast becoming the de facto standard for the annotation of gene products. One of the claims made, is that it should allow improved querying of databases. Different resources queried the same term should recover all and only entities conforming to that notion. One obvious way to query a database would be to ask for all proteins with "semantically similar" annotation to a query protein.

Recently the BioCreative announced that they would be using our similarity measure, or something similar as part of the assessement of their text extraction evaluation. An implementation for microarray analysis has been produced as part of the BioConductor package.

We have used semantic similarity measurements developed for the WordNet Electronic dictionary/theasaurus, and applied them to the Gene Ontology. We've validated these measures by comparing with sequence similarity. This work leads us to believe that these measures are a biologically valid way of querying the Gene Ontology.

From here you can download the GO Graph software which we use to calculate semantic similarity scores. This software started out as a package for drawing graphs of GO. This functionality has now been moved to the GO perl API, and can be accessed directly from Amigo, making the package name increasingly inaccurate.

This software is beta quality, that is it might work, but it might not. It's also an increasingly long time since I wrote this and the libraries that it depends on have changed a lot. This software is released under the a free software license. See the tar ball for details.

If you want to read about the measures and our validation of them, you can get information from my publications page. You may also be interested in some of the original papers from which we based our er... research.

These measures have also recently been used over biological literature and expression levels

