Highly Literate Ontology

  • What is an ontology and what is it for

  • The need for description

  • Ontologies and Programming

  • Literate Ontologies

  • Lenticular Text

  • The Future

Note

Introduction. Today is going to be a bit of a story, talking about the need for description in ontologies. I want to talk about a new paradigm for ontology development which enables this, and two pieces of technology that implement it.

There is a fair bit of technology involved, so breath deeply, as parts of this will be a whistlestop tour.

What is an ontology

  • A specification of a conceptualisation

  • A conceptualisation of something

  • Lots of disagreement on what "something" is

Note

An ontology is a specification of a conceptualisation. Clearly it’s a conceptualisation of something. Now there has been a lot of discussion and disagreement on exactly what that "something" should be. And a lot of documents written about it.

What is the problem?

  • Clear that "something" is complicated

  • We all agree that it’s worth writing about.

Note

But there are some sources of agreement at least. First, we all agree that the something is complicated, and we all agreement that it is worth talking and writing about.

Story

  • From OBI

A supernatant role is a role which inheres in a material entity…
— OBI
A pellet is a material entity which results from the aggregation of cells…
— OBI
Note

As Christmas is approach, we should start off with a story, a Christmas fable. A long, long time ago, I was working on an ontology called OBI at a meeting not far from here. We found these two definitions in OBI talking about the pellet and supernatant in a centrifuge tube.

Story

  • Pellet is a material entity?

  • Supernatant is a role?

  • Why the asymmetry?

  • Difficult to remember

Note

Now, why do we have this asymmetry in our definition. It seemed strange and unobvious, and after five minutes of discussion we all agreed that it was wrong. Till I remembered having exactly the same discussion six months before. And it’s because the supernatant is a liquid; once you pour it off, it’s not really a supernatant any more.

This was difficult to remember. And, while the documentation defined the terms, there was no justification.

Conclusions

  • Ontologies need a lot of description.

  • OWL doesn’t provide rich support for documentation

  • Annotations are un-ordered

  • Protege treats documentation as an add-on.

Note

My conclusion is this. Ontologies need a lot of description. But, there is a problem. OWL does not really support this. Annotations are un-ordered for instance, and protege treats documentation as an add-on. We can’t add sections, bibliographic references and so forth.

Literate Programming

  • Some programs require lots of documentation

  • "Literate Programming"

  • Single source file, generate out programmatic and documentation source

  • Best Examples: LaTeX and Sweave

images/tangle.png
Note

Some programs require a lot of documentation also, so this is not an new issue. In fact, this problem gave rise to the idea of literate programming; the overall workflow for a literate program looks like this. We edit a single piece of source code which has got both code for execution and turning into documentation. Examples of this include latex (which documents latex which is as confusing as it sounds) and Sweave which combines latex and R to generate out figures for a paper.

Can we do Literate Ontology?

Note

This seemed like a good idea, so I thought that I would this with ontology. I tried a variety of solutions over the years which you can read about on my blog here. All of them were based around OWL Manchester syntax which is a clean syntax for editing.

My conclusions are this:

Conclusion

  • No one is going to write ontologies this way

  • Writing ontologies is hard

  • Only a crazy person would write an ontology in a flat-file

Note

Basically, no one is going to write an ontology in a flat file. It’s hard enough as it is, without having to get everything right in some obscure syntax. Only a crazy person would write an ontology in a flat file.

Conclusion

  • No one is going to write ontologies this way

  • Writing ontologies is hard

  • Only crazy person would write an ontology in a flat-file

images/midori_harris.jpg images/mike_ashburner.jpg

  • Even GO has mostly stopped doing this now

  • But what about karyotypes?

Note

Here are two crazy people — in fact, many early versions of GO were written in a flat file, and there are some advantages to it. But, largely, even they have stopped now.

And that is what I thought, until we got to karyotypes.

A diversion: Karyotypes and Tawny-OWL

  • Humans (normally) have 46 chromosomes

  • Lost chromosomes or bits of chromosome are bad

  • Describing this ontology seems useful

images/karyogram.png
Note

So, what is a karyotype? It’s a description of all the chromosomes normally, in a human. It’s clinically important because losing bits or all of chromosome is generally bad. We want to describe this ontologically.

A diversion: Karyotypes and Tawny-OWL

  • Each of the 23 pairs has bands (visible structures)

  • That’s 1000 classes all very similar

  • Protege is not going to work

images/ChromXISCN09.jpg
Note

There are lots of chromsomes and lots of bands. It’s complicated, and repetitive. In the end, we realised we had to program it.

Tawny-OWL

  • Interactive environment, create new ontologies, classes

  • Built on Clojure, the JVM, OWL API and some reasoners!

Note

And, so I built Tawny-OWL — and some of you will have been at the tutorial yesterday.

Tawny-OWL

Allows us to repurpose software engineering tools.

  • version control

  • commit discussions

  • pull management

  • unit testing

  • continuous integration

  • collaborative editing

  • Integrated Development Environment

  • Power Editor

Note

We have all of these capabilities. Many of these WebProtege adds, but they have been explicitly written and coding for WebProtege. We use off-the-shelf tooling, maintained by other people. And generally it’s very good.

Literate Ontology

  • We can write ontologies in text, using a rich programming environment

  • We can write documentation in text, also with a rich markup

images/tangle.png
Note

So, this gives us a powerful environment, and one that is as competitive although very different from Protege. But it also means that we can define our ontologies in a flat-file, using a rich programming environment.

And, of course, we can also write documentation in text, either latex or any of the more recent markup (markdown!) languages.

Problem

  • Do we used a rich environment for programming?

    • Eclipse, Emacs CIDER, Intellij

  • Do we use a rich environment for documentation?

    • Scientific Word, LyX, Emacs auctex

Note

But, there is still a problem. We can use a rich environment for programming. We can use eclipse, or Emacs or Intellij. Or we can use a rich environment for documentation, tools like scientific word, lyx or auctex. But we have to choose. And, if we choose one the other will suffer.

Solution: Lenticular Text

  • Invented the notion of "Lenticular Text"

  • After "Lenticular Printing"

  • Text which changes depending on the way you look at it

  • Model-View-Controller for Text

Note

Worried about this for a while, so I invented a new notion of lenticular text. This is a little bit hard to describe, so I will show some pictures in a minute.

Lenticular is after "lenticular printing" which are those images that change depending on the angle you look at them from. Same idea — I wanted text which changes depending on your point-of-view. Or, for those of you from a software engineering background, you can think of this as model-view-controller but for text.

Solution: Lenticular Text

  • View either representation

  • Edit either in a rich environment

images/lentic.png
Note

We should be able to have two representations, linked togehter. Then we can edit either and still use our downstream tools like tawny-owl, like asciidoc to operate on it.

Implementing Lenticular Text

  • Implemented this with Emacs "lentic" package

  • Why Emacs?

    • Strong support for Clojure

    • Strong support for several markup languages (latex, asciidoc, markdown).

Note

Now, to achieve this, we have to build the tool into an editor. I chose Emacs for a variety of reasons — it has strong support for CLojure, and also for writing documentation in latex or others.

Implementing Lenticular Text

  • Implementation is for Emacs

  • Algorithms are portable

  • Solution is very low-level

  • So, (mostly) neutral to text

  • Currently works for

    • Clojure, Emacs-lisp, Scala, Haskell, python

    • Latex, Org-Mode, Asciidoc

  • New languages can be added

Note

While the implementation is for Emacs, the algorithms are portable. It plugs in at a fairly low-level — so, it’s most neutral to the text and can provide lenticular views for many different syntaxes — currently we have this lot, in any combination. We can even do several languages at once. None of the tools in use have to be modified at all.

What it looks like

  • Harder to explain than show

  • In practice, works straight-forwardly

Note

So, this is how it looks — we start off with a piece of text which is a syntactically valid latex. It can compile, and within Emacs we can add references, sections, italics, tables figures whatever.

But as we move down, we also seem embedded code snippets. Now we can change to another view and in this case the documentation is background coloured to red, and now we can see the code properly. This code is live — it can be formated, refactored and run directly inside the editor.

What does it look like

  • Allows some fairly advanced editing

  • This is the pizza ontology

images/Arabic_out-crop.gif

  • Both the ontology and documentation being written in Arabic

  • Right-to-Left text.

Note

This also allows us some pretty advanced forms of editing. This for example, is the tawny version of the pizza ontology. The documentation in this case is literate, but more both the ontology and the documentation are in two languages — in this case, we have made some Arabic pizza, which is a right-to-left documentation form.

What does it look like

  • Here used over non-ontological source code

  • Using "Org-Mode", which is a markup language

  • Multi-lingual

  • With web publishing

images/chinese-pyim-lentic-animated.gif
Note

This is another example, in this case over non-ontological source code — lentic is quite general. In this case, the text is using a different markup format called org — there is very advanced support for this — we can do tables, spreadsheets, timesheets, todo lists and web publishing. And, of course, multi-lingual — in this case, one of the Chinese languages which uses multiple keypresses for every character.

What are we using it for

  • Karyotype Ontology

  • Tawny-OWL documentation (including tutorial yesterday)

  • Non-ontological uses (including lentic documentation)

Note

So, what are we using it for. The karyotype ontology is now being converted to this form which is good because it’s a complex ontology. Tawny-OWL documentation which is in many cases fully executable as an ontology; the tutorial yesterday is an example. And non-ontological use also, including the lentic documentation (it is, of course, self-documenting).

Limitations

  • The tooling is complex (but powerful)

  • Different parts of the tooling have different limitations

    • LaTeX interacts poorly with web (but you can use asciidoc)

    • Asciidoc has weak referencing capabilities (but you can use latex)

    • Lentic is Emacs-specific

Note

In some senses, the tooling is quite young and it has some limitations. It’s complex (but powerful). And different parts of the tools have different limitations.

LaTeX is not good for the web, but asciidoc works fine. Asciidoc is good for the web, but bibliographies are not so easy. Lentic itself is currently emacs-specific.

Summary

  • Both ontology and documentation should be considered first-class

  • This is possible with a text representation

  • Tawny-OWL makes ontology development in text possible

  • Many existing text formats for documentation

  • Lenticular Text allows editing both together

Note

So, to summarise.

Future

  • New workflows for ontology development

  • Can we integrate office software?

  • Karyotype ontology has excel-driven test suite

  • What do we actually need to write?

  • Semantic documents as well as literate ontology

Note

And for the future. We are interesting in new workflows. Can we integrate this with word, which domain users tend to like — we already have an excel driven test suite. We need to understand what we need to write in documentation — this is not easy, even after 20 years, still no one has produced a really good hypertext story. And finally, here I have been talking about literate ontologies, but the same technology enables semantic documents, where the ontological statements support the discourse.

Acknowledgements

  • Jennifer Warrendar — The Karyotype Ontology

  • Aisha Blfgeh — Arabic Pizza

  • Feng Shu (Tumashu) — Chinese Screenshots

  • Phillip Lord — Tawny-OWL, Lentic

URLs