Introduction

  • Tawny-OWL is a fully programmatic environment for Ontology Development

  • Presents a different model of ontology development

  • Very powerful

  • Allows reuse of commodity tooling

Note

Good morning, my name is Jennifer Warrender and I will be your presenter for today’s Tawny-OWL tutorial.

Developed by Phillip Lord, Newcastle University and motivated by my karyotype work; Tawny is a fully environment for ontology development in that presents a different approach to ontology development.

Very powerful tool; As well as being simple to use, built on a programming environment and OWL API.

In addition, allows reuse of commodity tooling

Today

  • Taught material

  • Plenty of documented examples

  • Allow the audience to "follow-by-leader"

  • Plenty of time for questions

  • A flexible tail

Note

Today, we will generally be taught material, particularly at the beginning and the end, where I need to describe parts of the system.

However the tutorial contains plenty of documented examples that allow you the audience to follow. You will find the documentation for Clojure code embedded in the Clojure files themselves. This "literate programming" was accomplished using lentic.

I have also included a reasonably flexible tail where I try and address questions that people have.

Also, there is a degree of guesswork into how much material I can get into the time available, which this helps to address.

I do not have time to describe all of the features of Tawny, nor give a full tutorial.

Today

  • Attempted to make decoupled

  • Aware that people will come in and out

  • Probably not be entirely successful at this

  • Gets more programmatic as we go on

  • Will not explain all of the programming

Note

Although the tutorial does "build", I am aware that people will come in and out. So I have tried to make it as loosely coupled as possible, although the truth be told, I have probably not been entirely successful at this.

The latter half requires more programming knowledge, and there is probably little that I can do about this. I will skip over what some of it means, in the hope that those who care can go back later and look it up. I do not want this to become a Clojure tutorial.

Knowledge Pre-requisites

  • Required

    • Basic knowledge of ontologies

    • Basic knowledge of OWL

    • Basic knowledge of amino acids

  • Not Required

    • Knowledge of Clojure

    • Knowledge of OWL API

    • Knowledge of Tawny-OWL

    • Be highly-experienced programmers

      • WOULD BENEFIT from some programming experience

    • Familiarity with an IDE/Editor with Clojure support

      • WOULD BENEFIT if they followed instructions to run Clojure "hello world" program in their IDE of choice

Note

As stated in the tutorial advert; the pre-requisites are knowledge of ontologies, OWL and amino acids as the main focus on this tutorial is building an ontology of amino acids.

You are not required to have any knowledge of Clojure, OWL API, Tawny, be programmers, or be familiar with IDEs with Clojure support. However any of these would be of benefit to you.

Outcomes

  • Understand the motivation behind Tawny-OWL

  • Understand and use basic Clojure infrastructure

  • Build a basic ontology with Tawny-OWL

  • Understand pattern usage within Tawny-OWL

  • Implement a pattern, using Tawny-OWL

  • Understand the relationship to programmatic IDEs and related tools

Note

There are six outcomes: 1. Understand why Tawny was built 2. Understand the syntax of Tawny (Clojure) 3. How to build a basic (amino acid) ontology using Tawny 4-5. The usage and implementation of patterns in Tawny 6. Understand how Tawny relates to programmatic IDEs and related tools

Notes on Notes

Note

With regards to tutorial materials:

All materials are available @ this link

They are available in two formats: 1. Slide format 2. Book format (slides w notes)

Motivation

  • Tawny-OWL was initially an accidental tool

  • We did not start off writing a new environment

  • We discovered we needed while trying to build an ontology

Note

We didn’t start out to develop a new ontology engineering tool. It happened along the way, as we tried at address a specific use case, which was modelling human karyotypes.

Use Case (karyotypes)

  • Current, string representation comes from ISCN

  • Written in a book, no computational representation

  • The human karyotype is complex to describe

Note

What is a karyotype? A karyotype is a description of the chromosomes of a cell.

The current string representation comes from the International Systematic human Cytogenetic Nomenclature. It is a composite of three pieces of information: 1. the number of chromosome 2. the sex chromosomes present 3. the presence of any abnormalities (if any)

The current representation goes back to the days of type writers. ISCN is published as a book. The specification is not machine interpretable, the ISCN strings themselves are not interpretable, in fact, you cannot even represent them fully in ASCII because they use meaningful underlines (to describe homologous abnormalities).

In addition, we find that describing human karyotypes is complex.

Use Case (karyotypes)

  • Alterations, even more so

  • 46,XX

  • 47,XXY (aka Klinefelter’s Syndrome)

  • 46,Xc,+21

  • 46,XY,+21c,-21

  • 45,XY,-10,der(10)t(10;17)(q22;p12)

  • 46,XY,der(7)t(2;7)(q21;q22)ins(7;?)(q22;?)

  • 46,XX,der(8)t(8;17)(p23;q21)inv(8)(p22q13)t(8;22)(q22;q12)

  • 46,XX,der(9)del(9)(p12)t(9;22)(q34;q11.2),der(9)t(9;12)(p13;q22)inv(9)(q13;q22)

  • An ontological representation seems like a nice idea

Note

Especially when alterations are involved.

  1. The first ISCN string represents a female individual with no alterations

  2. The second is the ISCN string for a male individual with Klinefelter’s syndrome — an additional x chromosome

…and so on.

It is for these reasons an ontological representation seems like a nice idea.

A partonomy?

  • What you see is what you get?

  • There are 24 chromosomes

  • Around 1000 bands, at different resolutions

ChromXISCN09.jpg
Note

How would we go about building an ontology for karyotypes?

We could build it using a strict partonomy…

There are: - 24 chromosomes (chromosomes 1-22, X and Y) - more than 1000 chromosome bands (bands can be subdivided into sub-bands depending on which resolution you karyotype as shown in the pic)

And generally they are of similar structure.

Protege

  • Could do this in Protege

  • Technically, it scales well to an ontology of this size

protege-pizza.png
Note

We could build this in Protege. Technically, 1000 terms is not a problem for Protege, it can scale to this size (or, indeed, considerably larger) with relative easy.

Protege

  • But the user interface does not

  • Generating many similar classes is painful

  • Hard to know how an axiomatisation will perform at the start

  • Changing them afterwards even worse

click.gif
Note

But the UI doesn’t scale in this way. It involves an awful lot of clicking — one report I have heard suggests that Protege users spend up to 50% of their time expanding and closing the hierarchy.

With the karyotype ontology this problem would be profound.

Worse, with the karyotype ontology we have a specific computational use in mind, and we don’t know what the performance is going to be like — reasoners can change performance quite a lot with different axiomatisations.

If we want to change the axiomatisation after built, worst case scenario, we start again.

Protege

  • We end up more like this

click-fast.gif
Note

In practice, we are more likely to end up like this; 1000 classes is an awful lot of clicking, particularly when many of the classes are very similar.

Can we do this programmatically?

  • Yes, but painfully

  • OWL API — used by many, including Protege 4

  • Java and the OWL API are long-winded

  • Compile-Code-Test cycle

// Create ontology
OWLOntologyManager m = create();
OWLOntology o = m.createOntology(example_iri);
// Add the OWL classes
OWLClass nucleus = df.getOWLClass(IRI.create(example_iri + "#Nucleus"));
OWLClass cell = df.getOWLClass(IRI.create(example_iri + "#Cell"));
// Add the OWL object property
OWLObjectProperty partOf = df.getOWLObjectProperty(IRI.create(example_iri + "#partOf"));
// Assert the axiom
// 1. Create the class expression
OWLClassExpression partOfSomecell = df.getOWLObjectSomeValuesFrom(partOf, cell);
// 2. Now create the axiom
OWLAxiom axiom = df.getOWLSubClassOfAxiom(nucleus, partOfSomeCell);
// 2. Add the axiom to the ontology
AddAxiom addAxiom = new AddAxiom(o, axiom);
// 3. We now use the manager to apply the change
m.applyChange(addAxiom);
Note

Can we do this programmatically?

Yes we can. The main API out there is the OWL API which is used by many, including Protege 4

It’s nice, but is long winded, and difficult. Both because of the complexity of a type system needed for OWL (from the Javadoc it is hard to work out which methods can be invoked on which type), the change object system (so, you can use AddAxiom to add an annotation to an ontology, but only if you don’t care about it working), and the factory layer. All complex.

In addition, it can be time consuming completing the compile-code-test cycle over and over.

Brain

  • Written by Samuel Croset, EBI

  • EL only

  • How does this script fit with Java’s OO design?

  • Compile-Run cycle

// Create ontology
Brain brain = new Brain();
// Add the OWL classes
brain.addClass("Nucleus");
brain.addClass("Cell");
// Add the OWL object property
brain.addObjectProperty("partOf");
// Assert the axiom
brain.subClassOf("Nucleus", "partOf some Cell");
Note

There are alternatives.

For example Brain, written by Samuel Croset. It is much lighter weight than the OWL API.

But it is only EL expressive, and it is unclear how what is essentially a script fits with Java’s OO design.

Lastly the any changes require, recompile, restart which is slow/time consuming.

The Paragon : R

  • R provides an interactive, exploratory environment for statistics

  • Command line shell, wrapped by several GUIs

  • Language is convenient to type and use

  • It’s not all good

  • The syntax can be bizarre

  • The language semantics are strange

Note

So what is the ideal?

Something like R, the statistical language. It is interactive, convenient to use. It can be used cleanly in batch. In general, very nice to type and use.

However we’re not saying that we wanted to copy all of its features though. The syntax can be bizarre and the language semantics can be considered strange.

Constraints

  • Simple to do (structurally) simple ontologies

  • OWL API — too much code to rewrite

  • Java (JVM) — because of the OWL API

  • Pre-existing development tooling

Note

These are the limitations that we had to live within.

Most importantly of all, I wanted it to be as simple as possible to build structurally simple ontologies. It should largely be possible to type and write ontologies without feeling that you are programming.

It was going to be written using the OWL API because there is too much code there to rewrite, and no one would trust Phil to do that in a standards compliant way.

This required the use of the JVM.

Lastly I wanted access to pre-existing development tooling. I did not want to build a complete development environment, I needed something off-the-shelf, so that it was good.

Karyotype Ontology

  • What have we achieved?

  • Around 1000 classes in the karyotype ontology

  • Similar numbers of tests, structural and reasoner based

  • Models 10 events, with patterns for downstream use

  • Multiple levels of ploidy

  • Performance tested axiomatisation

Note

Before I move onto Tawny-OWL features, I want to briefly summarise what we achieved with the karyotype ontology.

Well, I think quite a lot. We now have a large, consistent (in both the formal and informal sense of the word) ontology that describes most levels of the ISCN.

Tawny-OWL Key Features

  • Ontology building tool

  • Unprogrammatic syntax

  • Evaluative

  • Broadcasting

  • Patternised

  • Fully Extensible

  • Integrated Reasoning

  • Built on commodity language

  • Access to fully programming Toolchain

Note

Next, we move onto a formal walk-through of tawny-owl features. In this section I do not intend to describe all of the features in detail, but to give an overview, so that you will know what is coming up.

This is a literately programmed document, so we start with a namespace, but I have hidden this from the slides because we do not need so much baggage so early on. It will be introduced in detail later.

(ns tawny.tutorial.features
  (:use [tawny owl reasoner]))

Ontology Building Tool

  • Tawny-OWL is an Ontology Building Tool

  • A "Textual User Interface"

  • Usable as an API

  • But not designed as an API

  • Built with the users in mind

Note

First feature : Tawny is an ontology building tool

Unlike Protege (which has a GUI), Tawny-OWL has a textual user interface; changes to the ontology are made through the application of Clojure forms

For those of you from a functional programming background, tawny is not very functional.

It is unusable as an API for manipulating ontologies, but was not really designed for that purpose but built with the users in mind; features have been implemented to help the users build the ontology quickly and simply as possible.

Unprogrammatic Syntax

  • Both OWL API and Brain carry Java baggage

  • You can never forget you are programming

  • Tawny-OWL aimed to avoid this

(defontology o)
Note

Second feature : Tawny has an unprogrammatic syntax

The problem with exiting programmatic solutions to building ontologies (e.g. the OWL API and Brain) is that is carries a lot of Java baggage.

So, when using these you can never truly forget that you are programming

Thus we have aimed as far as possible to make tawny simple to define simple ontologies.

Here, this statement defines a new ontology. Of course, the choice of programming language that we have chosen has implications and the parenthesis is the most obvious one to anyone from a lisp background.

Unprogrammatic Syntax

  • Many OWL syntaxes to choose from

  • Functional, Concrete, XML, RDF

  • Manchester syntax (OMN) is designed for typing

  • Frame-based, rather than axiom-based

    • i.e. all information about an entity is grouped

    • Entity-Frame-Value

Class: o:A
    Annotations:
        rdfs:label "A"@en,
        rdfs:comment "A is a kind of thing."@en
Note

Creating a new syntax for ontologies seemed unnecessary as there are really far too many of these already e.g. Functional, XML or RDF

Thus it was decided that Tawny model an existing syntax.

But which syntax?

One which was built for the specific use case of typing was Manchester syntax (also known as "OWL Manchester Notation" or OMN). It is a relatively clean syntax, and can be used to define new classes easily.

Manchester is frame-based which means that all information about an entity is grouped into a single construct as shown in the example.

Generally a frame-based syntax follows the format: Entity-Frame-Value(s)

Unprogrammatic Syntax

  • Tawny-OWL modelled on OMN

  • Modified to use Clojure/Lisp syntax

(defclass A
  :label "A"
  :comment "A is a kind of thing.")
  • Entities need parenthesis

  • No longer need commas

  • Blocks are explicit, so easier to parse

Note

Thus Tawny is modelled on Manchester Syntax…

…but modified to use valid Clojure/Lisp syntax

This is the equivalent class definition for class A in Tawny (recall the OMN definition shown on the previous slide)

Some things are easier, and some are harder For example: - entities need parenthesis - no longer need the commas between values - blocks are explicit which means it is easier to parse

Unprogrammatic Syntax

  • Frame name usage i.e. :frame-name and not frame-name:

    • Just for fit with Clojure

  • Some frame name changes e.g. :super rather that SubClassOf:

  • Some new "convenience" frames

    • :label and :comment

    • :sub meaning ontologies can be built bottom-up or top-down

  • Easy creation of new entities

(defclass B
  :super A
  :label "B")
Note

These are not the only differences between Manchester syntax and Tawny-OWL.

Usage of frame names has changed so its :frame-name rather than frame-name:. This was so that we could use Clojure keywords as frame names.

Some of the frame names have changed. For example :super rather than SubClassOf:. For more information on this see Phil’s blog.

For the sake of consistency the same keyword can be used with properties

Being a programming language rather than a format it is relatively easy to add new features with a clearly defined semantics.

So, for example: - :label and :comment — expand to relevant annotation axioms - :sub keyword so that ontologies can be built bottom-up. In practice, so far this has not really used. However this ensures that the syntax does not dictate the ontology development style.

Together these provide an easy way of creating of new entities as shown in the class definition of B.

Unprogrammatic Syntax

  • Two Comment syntaxes

  • Explicit creation of new entities

    • Less error-prone

    • Define before use optional

  • General Concept Inclusion (GCI) fully supported

  • Same syntax for patterns

Note

There are some other differences. Firstly, tawny has (two) comment syntaxes. OMN is commentable too but the parse doesn’t always work.

The other major one is the use of an "explicit definition" semantics. Classes must be defined before they are used by other classes. This is a semantics shared with Brain, and was chosen deliberately as it was too easy to make typo’s without.

Though it is optional through the use of Strings.

General Concept Inclusion (GCI) is fully supported which OMN doesn’t do.

We can also build patterns in the same syntax and files as the ontology. You will see many examples of this through the tutorial.

Evaluative

  • Tawny-OWL is "evaluative"

  • Add new classes, new properties, new frames on-the-fly

  • Redefine patterns

  • Add new tests, and rerun

  • There is no compile cycle

Note

Feature 3 : Evaluative

We can type, change and add new entities and reprogram things as we go. For the non-programmers this is likely to be so obvious that you cannot see why I am describing it, but for programmers from a Java background it is hard to under-estimate what a massive difference this makes to development styles.

There is no compile cycle — you can change things as you go, and you do not need to continually restart your application.

This is not entirely true, you do need to restart, but not that often.

Massive time saver

Compiled

  • There is a compile cycle

  • But you won’t notice it

  • Tawny-OWL is performant

  • Tawnyised version of GO loads ~1 min

Note

Actually, there is a compile cycle. But you won’t notice it.

So those of you from a programming background, may be thinking "hmm, an interpreted language running on top of JVM. Actually, Clojure statements are compiled to bytecode on-the-fly the run directly on the JVM (which in turn will JIT compile them). So, it’s fast enough.

Main thing here is that Tawny is performant, in fact the tawnyised version of GO loads in about a minute.

Broadcasting

  • An idea borrowed from R

  • R is very flexible with numbers and lists

  • Add number to list, adds the number of every element of the list

> c(1,2,3) + 4
[1] 5 6 7
Note

Feature 4 : Broadcasting

Broadcasting is a really very handy feature from R. You do not have to explicitly deal with the lists and numbers (ontology entities in the case of tawny).

In this R example, we are adding four to all the elements of the list.

Broadcasting

  • Tawny-OWL does something similar

  • owl-some expand to two existential restrictions

(defoproperty r)
(defclass C
  :super (owl-some r A B))
  • C some r A

  • C some r B

Class: o:C
    SubClassOf:
        o:r some o:B,
        o:r some o:A
Note

Tawny does something similar

One statement in tawny expands to two in OMN. Two calls to the OWL API also. It is also one of those things that makes tawny less like an API and more like a TUI. Although this is reasonably efficiently implemented, it does have a performance cost — more than made up for in saved typing for ontology developers.

Patternised

  • Tawny-OWL allows patterns

  • Broadcasting works naturally with patterns

  • some-only the most common

(defclass D
  :super (some-only r A B))
  • Expands into three (or n+2) axioms

Class: o:D

    SubClassOf:
        o:r some o:A,
        o:r some o:B,
        o:r only
            (o:A or o:B)
Note

Feature 5 : Patternised

This is the first example of a pattern that we see (although broadcasting is a pattern also, in a sense). This is the "some-only" pattern which is so common, it often not seen as a pattern.

This was also the motivation for broadcasting as some-only makes little sense without broadcasting, although it might not be immediately obvious why this is the case.

In this class definition for D, the some-only statement expands into three axioms; two existential and one universal.

Single fully extensible syntax

  • Tawny-OWL is implemented in Clojure

  • Tawny-OWL patterns are implemented in Clojure

  • Tawny-OWL ontologies are written in Clojure

  • Therefore, adding new patterns is trivial

  • Here we introduce two new patterns and use them

(defn and-not [a b]
  (owl-and a (owl-not b)))

(defn some-and-not [r a b]
  (owl-some r (and-not a b)))

(defclass E
  :super (some-and-not r A B))
  • Which gives

Class: o:E
    SubClassOf:
        o:r some
            (o:A or (not (o:B)))
Note

Feature 6 : Single fully extensible syntax

In theory, tawny does nothing that it is not possible to do already. But the single syntax and environment is important. I can easily add new syntax even for a specific ontology. Doing this where half the ontology is built in protege and half outside is just intractable. With a single syntax it becomes so easy that it happens often and all the time.

Reasoned Over

  • Tawny-OWL fully supports reasoning

  • In this case using HermiT

  • Based on all the examples given so far, F has three subclasses

(defclass F
   :equivalent (owl-some r (owl-or A B)))

;; #{}
(subclasses F)

(reasoner-factory :hermit)

;; #{C D E}
(isubclasses F)
Note

Feature 7 : Integrated Reasoning

Tawny is fully supports reasoning through two maven compliant reasoner; ELK and HermiT.

In this example we are using HermiT.

Based on the class definitions given so far, F has three subclasses; C, D and E.

Commodity Language

  • Built on Commodity Language

  • Full access to all APIs

    • Serialisation

    • Spreadsheet reading

    • Database access

    • Networking

    • Logic Programming

    • Test Library

    • Statistics and Plotting

    • Benchmarking

Note

Feature 8 : Commodity language

Most of this I am not going to show anything other than implicitly, but tawny is based on a commodity language. So it has access to many APIs which can do useful things for you. We have used quite a few of these either in the context of programming tawny or in developing OWL ontologies using tawny (in fact all of those given here).

The key point to remember here is that programming tawny and developing ontologies are not disjoint. You have the same power in using tawny as we do in developing it.

Commodity Toolchain

  • Editing

    • IDEs: Eclipse, IntelliJ, Netbeans

    • Power Editors: Emacs, Vim, Sublime

    • Web Editors: Catnip, GorrilaRepl

    • Novel: LightTable

  • Version Control: Any

  • Build and Dependency

    • Lein, Maven or Boot

  • Testing

    • Travis-CI, or any CI environment

  • Linters, Rewriters, Remote Evaluation (nREPL)

Note

Feature 9 : Commodity Toolchain

Finally, we have full access to a rich Toolchain, including a wide range of IDEs, power-editors or web editors, as well as some very novel environments (take a look at lighttable — implemented in Clojure and supporting it first).

We also make extensive use of version control — we’ve been using git, but you can use whatever you want. You can integrate your ontology development process and software development process.

Dependency — we’ll see later how to access ontologies using the maven dependency management system. Someone else can host your ontology without having to use their URIs!

Testing and Continuous Integration. Remote evaluation (actually, you will use this all the time even if it seems you are not).

Linters. Rewriters. We have only just got started with these.

That is the end of the features of Tawny-OWL.

Tawny-OWL Tutorial

Note

Now a quick detour i.e. pre-requisites.

All resources for this tutorial can be found on the git repository; link and command provided.

Other Technical Pre-Requisites

Note

For a hands-on approach to this tutorial these technical pre-requisites will be needed to be installed/downloaded.

See pre-requisites.adoc

Or: the alternative

Note

Only Protege needs to be installed.

Download generated ontologies are available; see link.

Task 1: Ontological Hello World

  • Build a Hello World Ontology

Note

The first task of this tutorial is to build a simple hello world ontology.

For those who have successfully downloaded the git repository, everything that I discuss in the section can be found in…

The namespace

  • Clojure has a namespace mechanism

  • The namespace is the same as the file name

  • But _ in file name is - in namespace

  • :use makes tawny.owl namespace available

(ns tawny.tutorial.onto-hello
  (:use [tawny.owl]))
  • The full code in these slides can be found in src/tawny/tutorial/onto_hello.clj

Note

Before you can create a new ontology, you must first declare the namespace.

The Clojure namespace declaration is the only "programmatic" part of Tawny that you have to see. Rather like Java, Clojure namespaces are consistent with the file name (although dashes are replaced with underscores for strange reasons). Most development environments will put this in for you.

We also add a statement to say that we wish to "use" tawny.owl. This is a file local import, and it will occur a lot!

Define an Ontology

  • Using the defontology statement

  • hello is a new variable

  • Can be used to refer to the ontology

  • Becomes "default" for this namespace

  • All frames are optional

  • We see more later

(defontology hello
  :iri "http://www.w3id.org/ontolink/tutorial/hello")
Note

Next we define a new ontology. It has a name that we can use to refer to it, which is a useful property as we shall see, although most of the time, we do not have to. For the defontology statement everything except for the name is optional, although there are quite a few frames more of which we will use as we move through the tutorial.

The name is also used as a prefix when saving the ontology (although this can be overridden), so I tend not to use hyphens, as it messes with syntax highlight for my Manchester syntax viewer.

Introduce a Class

  • Use a defclass statement

  • Statements are delineated with ()

  • Hello is also a variable

  • Hello and hello are different

  • Cannot use the same name twice

  • We use the "default" ontology

(defclass Hello)
Note

Once an ontology has been created we can define a new class. This is accomplished using the defclass statement.

As we can see, by comparing against the defontology, statements have are parenthical statements all the way through.

This form of statement is also known in lisp speak as a "form", an "s-expression" or rather more obscurely "sexp".

One key point from a programming point-of-view, Clojure is a lisp-1. All variables are in the same namespace, and that includes all the ontology entities that we might define. It’s easy to clobber one with another so be careful!

Properties

  • Properties use defoproperty

  • o to distinguish annotation and datatype property

  • Both defclass and defoproperty take a number of frames

  • All are optional

(defoproperty hasObject)
(defclass World)
Note

Properties use defoproperty. There are also annotation and object properties, and I have opted for somewhat opaque function names to avoid a large amount of typing. Hard to know whether this was a great decision or not, but the alternative really seemed various unreadable to me. Moreover, there are only a few of these names, and OWL 3 is unlikely to come along any time soon.

A complex class

  • Now we use a frame

  • :super says "everything that follows is a super class"

  • owl-some is the existential operator

  • The default operator (or only) operator in many ontologies

(defclass HelloWorld
  :super Hello
  (owl-some hasObject World))
Note

Finally, we bring in a frame. Frames are introduced using "keywords" — that is something beginning with a ":" — this is a fundamental part of lisp syntax (it is one of two ways of defining self-evaluating forms if you are interested!), which by good fortune is also very similar to the way that Manchester syntax does it.

Hopefully, most people are familiar with the meaning of "some" in this context.

On the use of "owl" (a quick diversion)

  • It is owl-some rather than some

  • But it is only and not owl-only

  • This avoids a name clash with clojure.core

  • Have not gone the OWL API route

(owl-some hasObject World)
(only hasObject World)
Note

As we move through the material, you will notice the "owl" prefix popping up. Although, clojure includes full namespace support clashes with the clojure.core are still somewhat of a pain. Therefore, I have avoided them in tawny.owl. In this, I have followed the OWL API. However, I have not done so consistently; the OWL API does by putting "OWL" at the beginning of everything.

On the use of "owl"

  • Full list of "owl" prefixed functions is

    • owl-and

    • owl-or

    • owl-not

    • owl-some

    • owl-class

    • owl-import

    • owl-comment

  • And the variables

    • owl-nothing

    • owl-thing

On the use of "owl"

  • There are also some "short-cuts"

    • &&, || and !

  • A "long-cut" (owl-only)

  • And consistent but clashing names in tawny.english namespace

Note

There are also options. We have symbol short-cuts which should be obvious to programmers. We have an owl-only function which I used to use because I used to forget which is the clojure.core function and which was not. And I have consistent namespace called tawny.english, but this requires using the namespace mechanism of Clojure in a way that is slightly more complex than I wish to describe.

Saving the Ontology

  • To save an ontology we use save-ontology

  • path/name of file is required

  • format is optional (default is :owl)

(save-ontology "o.omn" :omn)
Note

To check my axiomatisation I often save my ontology in Manchester syntax — I normally always use the same file name (makes using git, and .gitignore easier). And I have my editor auto-reload. We will be looking at how to build an auto-save function later.

More Frames

  • Can add new frames to existing definition with refine

  • Could also just change defclass form and re-evaluate

  • :label and :comment add annotations using OWL built-in

  • :annotation is general purpose frame

(refine HelloWorld
        :label "Hello World"
        :comment "Hello World is a kind of greeting directed generally at everything."
        :annotation (label "Aalreet world" "gb_ncl"))
Note

Finally, let’s extend our definition somewhat. We could, of course, just change the defclass statement, but I wanted to introduce the refine function which allows the addition of new frames to an existing definition. This is quite useful. You can use different frames for different types of entity, but only the correct frames for each type of entity (although many frames overlap).

Tawny has a number of convenience frames that have no direct equivalent in OMN. :label adds labels in English, and :comment does the equivalent. Obviously you can specify any type of annotation you wish, and tawny can build internationalised ontologies perfectly well.

Task 1: Conclusions

  • Tawny-OWL uses frames

  • It looks like Manchester syntax

  • But in a Lispy way

  • Easy to type, including short-cuts

Task 2: Defining a Tree

  • For the next section, we build an amino acid ontology

  • We do this in several ways

  • First, we start with a simple hierarchy

Amino Acids

  • Chemical Molecules

  • Central Carbon, with an amino and acid group

  • And a "side chain" or "R" group which defines the differences

  • Have a number of chemical properties

  • There are 20

Note

Amino acids are chemical molecules containing a carboxyl and an amino group. They have a number properties e.g. size and polarity and there are 20 of them.

A namespace

  • The name space declaration

(ns tawny.tutorial.amino-acid-tree
  (:use [tawny.owl]))
  • The full code in these slides can be found in src/tawny/tutorial/amino_acid_tree.clj

Tree

  • So, we define our (flat) tree

(defontology aa)

(defclass AminoAcid)

(defclass Alanine
          :super AminoAcid)

(defclass Arginine
          :super AminoAcid)

(defclass Asparagine
          :super AminoAcid)

;; and the rest...
Note

Defining the basic tree is not too hard — we just use the :super keyword. If you think this involves too much typing, yes, it does.

Disjoint

  • Let’s make everything disjoint

  • Asparagine is not the same as Alanine or Arginine

  • With three amino-acids, this is painful and error-prone

  • With twenty it would be almost impossible

(defclass Asparagine
    :super AminoAcid
    :disjoint Alanine Arginine)

Disjoint

  • But there is a more serious problem

  • Change Arginine to this

  • This will now crash (probably)

(defclass Arginine
    :super AminoAcid
    :disjoint Alanine Asparagine)
Note

In theory, this should crash, but it may not if you have previously evaluated the Asparagine form above. This reflects one problem with REPL based languages — sometimes your source can get out of sync with your REPL. If it does not crash now, it will crash when you restart and eval again.

Most REPL programmers restart defensively every hour or two to guard against this.

Explicit definition

  • Tawny uses explicit definition

  • Variables must be defined before use

  • This is deliberate

  • Can be avoided by using Strings

  • But that is error-prone

Note

Explicit definition is a good thing, but does hold out the possibility for being a real pain. We think that it is not so in Tawny, because we have features for working around it. It is possible to avoid entirely anyway, but I personally do not, because it makes spelling errors all too likely. I will show this once later on.

Simplifying the definition

  • We can use the as-subclasses form

  • Adds AminoAcid as super to all arguments

  • This syntax also protects against future additions.

(defclass AminoAcid)

(as-subclasses
  AminoAcid

  (defclass Alanine)
  (defclass Arginine)
  (defclass Asparagine)
  ;; and the rest...
  )
Note

Before we show the solution to the disjoint problem, we simplify our definitions. Typing :super AminoAcid a lot is a pain also, so let’s avoid that. We introduce a new function, as-subclasses

The advantage of lexically grouping all of the subclasses in this way is also that it makes the intent of the developer obvious. If we need to add a new subclass later (unlikely in this case once finished, but during development yes), then adding it next in the list also makes it a subclass as it should be.

This ability to make the developer intent and conformance to a pattern explicit is a good thing!

And disjoint!

  • This simplifies the disjoint behaviour also

  • We add :disjoint keyword

(defclass AminoAcid)

(as-subclasses
 AminoAcid
 :disjoint

 (defclass Alanine)
 (defclass Arginine)
 (defclass Asparagine)
 ;; and the rest...
 )
Note

Now that we have achieved this, we can also solve the disjoint problem, by adding a keyword in. All the classes will now be made disjoint. If you want to be sure, evaluate the source, save it and look at the OMN.

And, finally, covering

  • We might also want to say

  • These are the only amino-acids there are

  • To do this we use a "covering" axiom

  • Interesting ontology! Biologically true, chemically not.

Note

We can add a covering axiom in the same way. Unlike disjoint this is not a formal part of OWL, but is a design pattern. The interesting thing about this is that biologically it is true — there are only 20 amino acids and we will name them all. But to a chemist, it is demonstrably false as there are many, many amino acids. How we scope the ontology and frame our competency questions can very much affect the model that we build. We will show later that this axiom has a very significant effect on the end model, much more significant than you might presuppose at the moment.

And, finally, covering

  • This is also easy to implement

  • Here with all the amino-acids in full

  • Again, the source code grouping is useful!

(defclass AminoAcid)

(as-subclasses
 AminoAcid
 :disjoint :cover

 (defclass Alanine)
 (defclass Arginine)
 (defclass Asparagine)
 (defclass Aspartate)
 (defclass Cysteine)
 (defclass Glutamate)
 (defclass Glutamine)
 (defclass Glycine)
 (defclass Histidine)
 (defclass Isoleucine)
 (defclass Leucine)
 (defclass Lysine)
 (defclass Methionine)
 (defclass Phenylalanine)
 (defclass Proline)
 (defclass Serine)
 (defclass Threonine)
 (defclass Tryptophan)
 (defclass Tyrosine)
 (defclass Valine))
Note

Putting in a covering axiom, then adding a new sibling and forgetting to modify the covering axiom is an easy mistake to make, and can be very difficult to pick up, either by eye or by testing. Tawny makes this harder.

And, finally, covering

  • And, here is a subset of the equivalent OMN

Class: aa:AminoAcid
    EquivalentTo:
        aa:Alanine or aa:Arginine or aa:Asparagine
        or aa:Aspartate or aa:Cysteine or aa:Glutamate
        or aa:Glutamine or aa:Glycine or aa:Histidine
        or aa:Isoleucine or aa:Leucine or aa:Lysine
        or aa:Methionine or aa:Phenylalanine or aa:Proline
        or aa:Serine or aa:Threonine or aa:Tryptophan
        or aa:Tyrosine or aa:Valine

DisjointClasses:
    aa:Alanine, aa:Arginine, aa:Asparagine, aa:Aspartate,
    aa:Cysteine, aa:Glutamate, aa:Glutamine, aa:Glycine,
    aa:Histidine, aa:Isoleucine, aa:Leucine, aa:Lysine,
    aa:Methionine, aa:Phenylalanine, aa:Proline, aa:Serine,
    aa:Threonine, aa:Tryptophan, aa:Tyrosine, aa:Valine
Note

I have not shown all the OMN because it is long and tedious, but these are the key points added by the subclasses function.

If you evaluated the :disjoint Alanine definitions early (and they did not crash!), then you will find that there are some Disjoint: frames on individual amino acids also. These make no semantic difference and are an artefact of the tutorial.

Task 2: Conclusions

  • It is easy to build simple hierarchies

  • We can group parts of the tree

  • There is support for disjoints

  • There is support for covering axioms

Task 3: Defining some properties

  • A list of amino acids only gets us so far

  • Now we define some properties

  • First, we do this the long-hand way

Namespace

  • By now you should be familiar with the namespace definition

  • It is different

  • This :use clause means "use both tawny.owl and tawny.pattern"

(ns tawny.tutorial.amino-acid-props
  (:use [tawny owl pattern]))
  • The full code in these slides can be found in src/tawny/tutorial/amino_acid_props.clj

Note

As usual we start with the namespace declaration; you should be familiar with now. However this is slightly different. In addition to the tawny-owl, We want to "use" tawny-pattern namespace.

Starting our ontology

  • As before, we define (another) amino acid ontology

(defontology aa)

(defclass AminoAcid)

Amino Acid properties

  • Amino Acids have many properties

  • Most of these are continuous

  • Hard to model ontologically

Note

Now let’s discuss the properties themselves.

Amino acids have many properties e.g. size and polarity.

Most of these properties are continuous e.g. hydrophobicity values range from -7.5 and 3.1.

The ontological modelling of continuous values is hard.

It is, of course, possible to model continuous values as data type properties and just put the numbers in. This works to a certain extent and is an option for modelling.

The Value Partition

  • We introduce the "value partition"

  • We split a continuous range up into discrete chunks

  • Like the colours of the rainbow

  • First we define the partition itself

  • AminoAcid's have a size, and only one size!

(defclass Size)
(defoproperty hasSize
  :domain AminoAcid
  :range Size
  :characteristic :functional)
Note

However the most common approach ontologists use instead is the value partition.

The value partition the process of splitting up a continuous range into discrete chunks. For example splitting the spectrum of colours found on rainbow into seven "bins"/"values".

For full details and a discussion of the advantages and disadvantages of this approach, the value partition is described in this recommendation edited by Alan Rector.

First we define the partition for size by: 1. creating a Size class 2. defining a hasSize object property a. that ensures that AminoAcid has a size, and only one size

The Value Partition

  • Now we define the values

  • Three values, and only three values

  • All of which are different

(as-subclasses
 Size
 :disjoint :cover

 (defclass Tiny)
 (defclass Small)
 (defclass Large))
Note

Next we define the values of Size There are only three values i.e. Tiny, Small, Large All of the values are different

So we use the as-subclasses function, :disjoint and :cover keywords discussed in the task 2.

In a "real" ontology it would be good to add annotation properties and comments describing exactly what these partitions mean.

Using the values

  • We can now create our amino acids using these three sizes

  • We only create three amino acids here

  • More would be needed.


(as-subclasses
 AminoAcid
 :disjoint :cover

 (defclass Alanine
   :super (owl-some hasSize Tiny))

 (defclass Arginine
   :super (owl-some hasSize Large))

 (defclass Asparagine
   :super (owl-some hasSize Small)))

 ;;and the rest
Note

Once defined, we can use these values in our class definitions.

e.g. Alanine hasSize some Tiny

Interim Summary

  • Defined the Size value partition

  • Used these values in our class definitions

  • Issues with these restriction declarations?

    • Long-winded

    • Duplication

    • Error-prone

  • What can we do to overcome these?

Note

What have we done? - defined the Size value partition - used these values in our class definitions

This way of declaring restrictions is OK but:

  • its long-winded and involves a lot of typing

  • contains a lot of duplication: 20 amino acids means 20 existential restrictions of similar structure

  • what if we add another property?

  • for each new property that’s another 20 existential restrictions of similar structure

  • with a bigger ontology with many classes and properties it is possible to accidently create an existential with the correct property but incorrect value or vice versa

What can we do to overcome these issues?

Using Facets

  • Relatively new feature of Tawny-OWL

  • Call these "facets" after "faceted classification"

  • Useful when many values are associated with a property

  • For example value partition

  • First define Charge using the same pattern as Size

(defclass Charge)
(defoproperty hasCharge
  :domain AminoAcid
  :range Charge
  :characteristic :functional)

(as-subclasses
 Charge
 :disjoint :cover

 (defclass Positive)
 (defclass Neutral)
 (defclass Negative))
Note

We can used facets; a relatively new feature of Tawny.

Facets is named after faceted classification; a classification scheme for organising knowledge in a systematic order. You will have probably seen it on Amazon, Ebay, etc. for filtering search results e.g. location, buy it now.

See https://en.wikipedia.org/wiki/Faceted_classification for a description of a facetted classification.

Extremely useful when many values are associated with a property AA’s found in the value partition.

But first, let’s define the Charge value partition using the same pattern as Size.

Facet

  • Now define the values as facet of hasCharge

  • facets are extra-logical

  • They do not change the semantics of ontology statements

  • They are visible in the ontology

(as-facet
 hasCharge

 Positive Neutral Negative)
Note

Now define the facet; we ensure that the values Positive, Neutral and Negative are associated with the hasCharge property.

These facets are extra-logical; they do not change the semantics of ontology statements but are visible in the ontology as annotation axioms

I had a number of options for the implementation of facets. It would be possible to do this without having them visible in the ontology, but this seemed to the best way forward.

Using the facet

  • Can now give just the class

  • Again, using refine although could just alter the code

  • (facet Neutral) rather than (owl-some hasCharge Neutral)

  • Saves us some typing


(refine Alanine
        :super (facet Neutral))

(refine Arginine
         :super (facet Positive))

(refine Asparagine
        :super (facet Neutral))
Note

Why is this useful?

Now we can just provide the value and Tawny will convert the facet into an existential restriction.

Saves on some typing

And others

  • But we added the as-facet declaration

  • Now let’s add Hydrophobicity

(defclass Hydrophobicity)
(defoproperty hasHydrophobicity
  :domain AminoAcid
  :range Hydrophobicity
  :characteristic :functional)

(as-subclasses
 Hydrophobicity
 :disjoint :cover

 (defclass Hydrophobic)
 (defclass Hydrophilic))

(as-facet
 hasHydrophobicity

 Hydrophilic Hydrophobic)
Note

Not entirely true; We have to add the as-facet call so, overall, the reduction in typing is not enormous, although it’s a fixed cost, while the saving is linear with the number of amino acids, so the real saving would be a bit greater.

Using Facets

  • And Polarity.

(defclass Polarity)
(defoproperty hasPolarity)

(as-subclasses
 Polarity
 :disjoint :cover

 (defclass Polar)
 (defclass NonPolar))

(as-facet
 hasPolarity

 Polar NonPolar)

Using Facets

  • facet broadcasts

  • We can apply two at once

  • A different property can be used for each

  • We could do all four value partitions at once

(refine
 Alanine
 :super (facet Hydrophobic NonPolar))

(refine
 Arginine
 :super (facet Hydrophilic Polar))

(refine
 Asparagine
 :super (facet Hydrophilic Polar))
Note

Now we can see the full saving advantage. Instead of four separate statements, we have a single one.

The advantage goes beyond typing, of course, this also is an advantage in the consistency of building our ontology. We cannot associate the wrong class with the wrong property. In this case, we have set the logical axioms up such that if we did use the wrong property, the reasoner is likely to pick the complaint up anyway. But this is immediate and at the point of use.

Using Facets

  • And the output

Class: aa:Alanine

    SubClassOf:
        aa:hasCharge some aa:Neutral,
        aa:AminoAcid,
        aa:hasSize some aa:Tiny,
        aa:hasHydrophobicity some aa:Hydrophobic,
        aa:hasPolarity some aa:NonPolar
Note

Here is the complete class definition of Alanine in OMN format

Task 3: Conclusions

  • Can use value partitions to split up numerical ranges

  • Can define facets to ease the use of object properties

  • Can apply several facets at once

Task 4: Patternising

  • Make full use of an existing pattern from Tawny-OWL

The problem

  • Used value partition to define our properties

  • Issues with defining the value partition:

    • We still have a lot of typing

    • The value partition has lots of parts

    • Easy to get wrong (e.g. Polarity)

  • How can we simplify and ensure consistency?

  • Tawny-OWL supports this pattern directly

Note

In the previous task we used the value partition to define our amino acid properties.

Useful but still have a lot of typing to do; it takes at least 10 lines of code to define one value partition and is made up of many parts.

Which means it is easy to get wrong. For example did any one notice that the Polarity value partition was incomplete. We are missing the domain and range restrictions for hasPolarity in the last set of examples. Originally not a deliberate but genuine mistake; but once noticed it was decided to be deliberately left in.

How can we simplify and ensure consistency of the value partition? We can make use of patterns. Just so happens that Tawny supports this particular pattern directly.

Namespace

  • The value partition pattern is found in tawny.pattern

  • We use it here

(ns tawny.tutorial.amino-acid-pattern
  (:use [tawny owl pattern]))
  • The full code in these slides can be found in src/tawny/tutorial/amino_acid_pattern.clj

And the preamble

  • This is the same as before

(defontology aa)

(defclass AminoAcid)

The Size value partition

  • This is a new form

  • Syntactically similar to what we seen before

  • defpartition defines that we will have a partition

  • [Tiny Small Large] are the values

  • hasSize is implicit — it will be created

(defpartition Size
  [Tiny Small Large]
  :domain AminoAcid)
Note

This is syntactically similar because it’s just a new function defined in the language. We are adding nothing clever here just using a language as it is intended to be used.

The size value partition

  • And (some of) the OMN.

ObjectProperty: aa:hasSize
    Domain:
        aa:AminoAcid

    Range:
        aa:Size

    Characteristics:
        Functional

Class: aa:Size
    EquivalentTo:
        aa:Large or aa:Small or aa:Tiny

More value partitions

  • Adding partitions for all the properties is easy

(defpartition Charge
  [Positive Neutral Negative]
  :domain AminoAcid)

(defpartition Hydrophobicity
  [Hydrophobic Hydrophilic]
  :domain AminoAcid)

(defpartition Polarity
  [Polar NonPolar]
  :domain AminoAcid)

(defpartition SideChainStructure
  [Aromatic Aliphatic]
  :domain AminoAcid)

Using these partitions

  • defpartition also applies the as-facet function

  • So, we can use facet also

  • Syntactically, the ontology has simplified

  • Same semantics underneath

(as-subclasses
 AminoAcid

 (defclass Alanine
   :super (facet Neutral Hydrophobic NonPolar Aliphatic Tiny))

 (defclass Arginine
   :super (facet Positive Hydrophilic Polar Aliphatic Large))

 (defclass Asparagine
   :super (facet Neutral Hydrophilic Polar Aliphatic Small))

 ;; and the rest
 )

Task 4: Conclusions

  • Tawny-OWL directly supports the value partition

  • This integrates with facets

  • Together, can simply this (very common) form of ontology

Task 5: Understanding Names

  • Understand how Tawny-OWL uses IRIs and symbols to identify entities

  • See how to use them independently

  • Understand how they allow OBO ID support

Note

Let’s leave our amino acid ontology for now to discuss how Tawny-OWL uses IRIs and Clojure symbols to refer to ontology entities.

These IRIs and symbols can be independent which allows Tawny to support OBO IDs

Namespace

  • tawny.obo to support (OBO) numeric IDs

  • clojure.string is for string manipulation

(ns tawny.tutorial.whats-in-a-name
  (:use [tawny.owl])
  (:require [tawny.obo])
  (:require [clojure.string :as s]))
Note

Usual Clojure namespace declaration for this task

Background

  • Ontology entities need names

  • Tawny-OWL has different requirements for names

  • We need to support alternative workflows

  • So, Tawny-OWL is flexible

  • With "sensible" defaults

Note

First a bit a background:

Ontologies entities need names to be able to refer to them

As a community we typically use IRIs but Tawny-OWL has different requirements for names (symbols); something that is valid as an IRI is not necessarily valid as a (Clojure) symbol

In addition, we need to support alternative workflows e.g. OBO numeric IDs

So Tawny is flexible; IRIs and tawny names can be independent and has been built with "sensible" defaults

IRIs

  • Tawny-OWL is built on the OWL API

  • Underneath, therefore, it is part of the web

  • OWL uses IRIs (i.e. URIs or URLs) to identify entities

  • IRIs provide a single, shared global namespace

  • With a (social) mechanism for uniqueness

Note

Tawny is built on the OWL API, which means that underneath it is a part of the web.

We have to inherit directly from the web, because all the software that we are building on depends on it and/or requires it.

OWL uses IRIs and they have this unusual characteristic of being global.

Symbols

  • Tawny-OWL uses symbols to identify entities

  • Core feature of Clojure

  • Easy to type (e.g. A rather than "A")

  • Allows you handle them directly

  • Provides define-before-use semantics

  • Supported in IDEs (e.g. syntax highlight and auto-complete)

Note

In contrast, Tawny uses symbols to identify entities.

Symbols are a core feature of Clojure and any Lisp; They are kind of equivalent to variable names in other programming languages, but not exactly the same.

Using symbols provides a set of advantages: - They are easy to type e.g. A rather "A" - Lisp gives you flexibility to handle them directly - encourage define-before-use semantics - are normally syntax highlighted specially by editors and, rather usefully, they will normally auto-complete ("code-complete" or "intellisense") in an IDE. Very useful for big ontologies

Symbols and IRIs

  • What is the relationship between symbols and IRIs?

  • In Tawny-OWL, this is a per-ontology setting

  • By default the symbol forms the fragment of the IRI

(defontology o)

;; => #<OWLClassImpl <8d9d3120-d374-4ffb-99d8-ffd93a7d5fdd#o#A>>
(defclass A
  :ontology o)
  • Generates a random UUID

    • Not good practice but supported by OWL API

  • We use the :ontology frame

Note

If we do nothing else, tawny identifies an ontology using a random Universally Unique IDentifier (UUID). This is not entirely best practice and, indeed, is illegal for some serialisation formats. However, the OWL API supports it, and it’s useful where we are playing or giving a demo.

If we define a class, we use the ontology IRI to form the entity IRI, just adding the symbol name as the fragment. Note in this case I have used the :ontology frame as we have multiple ontologies in one file. This is the safe as otherwise the results depend on the order of evaluation.

To see the IRI, type "A" into the REPL after evaluating the two forms above. The UUID is random, so it will be different each time!

The "[OWLClassImpl]" stuff comes from the OWL API, and is just the "toString" method. It’s not great and at some point I will fix this.

Symbols and IRIs

  • We should identify our ontologies correctly

  • We use the :iri frame for our second ontology

  • Again, the class uses the symbol name as the fragment

(defontology i
   :iri "http://www.w3id.org/ontolink/example/i")

;; => #<OWLClassImpl <http://www.w3id.org/ontolink/example/i#B>>
(defclass B
  :ontology i)

Symbols and IRIs

  • The relationship is programmatically defined

  • We can change it to whatever we want

  • Using the :iri-gen frame to supply a function

  • Here we reverse the symbol name

  • We call the symbol name: "the tawny name"

(defontology r
  :iri "http://www.w3id.org/ontolink/example/r"
  :iri-gen (fn [ont name]
             (iri (str (as-iri ont)
                       "#"
                       (s/reverse name)))))

;; => #<OWLClassImpl <http://www.w3id.org/ontolink/example/r#EDC>>
(defclass CDE
  :ontology r)
Note

This is a pretty pointless transformation!

OBO Identifiers

  • OBO identifiers present a challenge

  • Source code is the ultimate in WYSIWYG

  • Use the underlying identifiers and display something different to the user

(defclass GO:00004324
  :super (owl-some RO:0000013 GO:00003143)
  :annotation (annotation IAO:0504303 "Transporters are..."))
Note

Where would this programmatic transformation be useful?

With Protege or another GUI, we can use the underlying identifiers and display something different to the user. With source code, we cannot. Clearly something like the example above is just not acceptable (although it is actually legal tawny code or lisp).

OBO Identifiers

  • Tawny-OWL provides an OBO style ID iri-gen function.

  • We set that here

(defontology obo
  :iri "http://www.w3id.org/ontolink/example/obo"
  :iri-gen tawny.obo/obo-iri-generate)

(tawny.obo/obo-restore-iri obo "./src/tawny/tutorial/whats_in_a_name.edn")

OBO Identifiers

  • Now we can evaluate these forms

  • Each gets a numeric identifier, OBO style

  • The identifier is stable

;; => #<OWLClassImpl <http://purl.obolibrary.org/obo/EXAM_000003>>
(defclass F
  :ontology obo)

;; => #<OWLClassImpl <http://purl.obolibrary.org/obo/EXAM_000002>>
(defclass G
  :ontology obo)

;; => #<OWLObjectPropertyImpl <http://purl.obolibrary.org/obo/EXAM_000001>>
(defoproperty ro
  :ontology obo)
Note

In this case, we have totally dissociated the IRI from the symbol. The IRI is not auto-generated here — it is stable, and will come out the same every time and on every machine. You should get the same IRIs exactly.

Task 5: Conclusions

  • Can generate random UUID IRIs for ease

  • Mostly, we use the symbol (variable) name as fragment

  • Relationship is programmatic

  • OBO IDs are a pain in source

  • Tawny-OWL supports them

Note

We can generate random UUID IRIs for ease

Mostly, we use the symbol name as fragment

The relationship between IRIs and symbols is programmatic and can be independent

OBO IDs are (not meaningful and) a pain in source

But Tawny supports them by substituting the numeric IDs to more meaningful names to aid the user

Task 6: Importing other Ontologies

  • Understand how to import and use another Tawny-OWL ontology

Importing

  • Many ontologies import from other ontologies

  • Allows cross-linking, and reuse of work

  • Tawny-OWL supports this

  • Involves two steps, use and import

Note

Many ontologies import from other ontologies e.g. GO and Relations Ontology (RO)

Why import? Its good practice, allows cross-linking and reuse of work

Tawny supports this

The ontology to import

  • The ontology we wish to import

(ns tawny.tutorial.abc
  (:use [tawny.owl]))

(defontology abc
  :iri "http://www.w3id.org/ontolink/example/abc.owl")

(defclass A)
(defclass B)
(defclass C)
Note

Aptly named abc.owl

Namespace

  • We have seen use many times before

  • A namespace with an ontology can be used like any other

  • Here we use require

  • Helps to avoid name collisions

(ns tawny.tutorial.use-abc
  (:use [tawny.owl])
  (:require [tawny.tutorial.abc]))

Using

  • Normally, using is not enough

  • We also need to explicitly import the ontology

  • Only after owl-import will its axioms become available

  • Warning: Clojure has an import function and it does not do the same thing

(defontology myABC)

(owl-import tawny.tutorial.abc/abc)
  • Here, we define our new ontology and import the ontology axioms from the other abc ontology

Note

I did strongly consider combing the use or require step into the import; this would have been (and still is for anyone who wants to code it!) entirely possible. However, there are perfectly valid reasons not to.

Without the import statement, we gain access to the identifiers from the required ontology. And re-using the identifiers without using all of the axioms from that ontology allows us to do some useful things. For instance, MIREOT does exactly this. Or for a different take, consider my idea for Ontology connection points.

Using

  • The require statement also allows us to use symbols

  • Here, we use an explicit name space tawny.tutorial.abc

  • And a symbol A

  • The symbolic approach protects against spelling mistakes

(defclass MyA
  :super tawny.tutorial.abc/A)

(defclass MyB
  :super tawny.tutorial.abc/B)
  • The resulting OMN

Class: myABC:MyA
    SubClassOf:
        abc:A

Class: myABC:MyB
    SubClassOf:
        abc:B
Note

If you get bored of typing, then you can also alias tawny.tutorial.abc to a shorter form, as we saw earlier with clojure.string.

Here, we are reusing basic Clojure functionality and its namespacing mechanism. That the symbols refer to ontology terms really makes no difference.

Task 6: Conclusions

  • require or use is a part of Clojure

  • Gives us access to symbols from another namespace

  • Ontologies still need to use owl-import

Task 7: Reading an Ontology

  • Understand what reading an ontology achieves

  • Use a simple example

The problem

  • We showed that we can import existing Tawny-OWL ontologies

  • By using and importing the relevant namespace

  • In order for this to work, we need the Tawny-OWL source code

  • What if we do not have it?

  • Or worse, what if it does not exist?

  • Tawny-OWL supports this

Note

In the previous task, we showed that we can import existing Tawny-OWL ontologies by using and importing the namespace.

But in order for this to work, we need the Tawny source code; this is not always possible

What happens if we do not have the original source code? Or worse, what if it does not exist in the first place i.e. it was built through a different means such as Protege?

Namespace

  • As usual, we declare the namespace

  • It is different

  • Require the tawny.owl and tawny.read

  • Have access to the symbols but do not import them into the local namespace

  • Ensures that the namespace has nothing else in it

  • Avoids namespace collisions

(ns tawny.tutorial.read-abc
  (:require [tawny owl read]))
Note

As usual, let’s start with declaring our namespace

Note that this namespace declaration is different; both tawny.owl and tawny.read are required i.e. we have access to the symbols but do not import them locally.

This ensures that nothing else is in it, if for no other reason than to avoid namespace collisions.

Reading

  • Tawny-OWL provides a solution called reading

  • Reading makes all entities available as symbols

(tawny.read/defread abc
  :iri "http://www.w3id.org/ontolink/example/abcother.owl"
  :location (tawny.owl/iri (clojure.java.io/resource "abcother.owl")))
  • In this case, a file abcother.owl has been saved locally

  • Can read from any URL

  • Highly configurable (e.g. filter and transform names)

Note

Tawny provides a solution to this problem called reading.

Reading makes all the entities available as symbols.

Also, note that we are using the OWL file from local, which gives us a degree of flexibility — you do not want to download GO every time you restart the REPL.

Although not covered here, defread is highly configurable. You can filter just the terms you want, change the names as you chose.

Reading

  • Here, we define our new ontology and imports the ontology axioms from other abc ontology

(tawny.owl/defontology myABC)

(tawny.owl/owl-import abc)

Reading

  • And access it’s value by symbol

  • Symbols must be defined

(tawny.owl/defclass MyA
  :super A)

(tawny.owl/defclass MyB
  :super B)
Note

Having read our ontology this now gives us the ability to refer directly, with symbols. So, we can type A or B. This is safe, and has all the advantages of symbol based definition.

Task 7: Conclusions

  • Tawny-OWL supports a read mechanism

  • Ontologies only available as OWL files can be used transparently

Task 8: Create New Syntax

  • Create new syntax describing the amino acids

  • Create all the defined classes

  • Use the reasoner

The Finale

  • This is rather more advanced

  • Generate a new syntax

  • But pulls together most of the tasks

  • Demonstrates the value of a programmatic environment

  • Possible to build ontologies without this

  • The biology is interesting also

Note

This is going to be a highly advanced, showing an high programmatic use of Tawny. In this course of this we will generate some brand new syntax — this demonstrates one of the uses of tawny — while it is harder to generate this new syntax, than just use existing, once you have done it is easier to use.

There is also some interesting biology and an interesting ontological question that comes out of the end of it.

The namespace

  • Lots of namespaces involved here

  • Tawny-OWL does have more namespaces

  • But not many

(ns tawny.tutorial.amino-acid-build
  (:use [tawny owl pattern reasoner util]))
Note

Probably this is an extreme example of use. If I were doing this normally, I would almost certainly require pattern, reasoner and util through an alias.

The Upper Ontology

  • Nothing new here

(defontology aabuild)

(defclass AminoAcid)

(defclass PhysicoChemicalProperty)

(defpartition Size
  [Tiny Small Large]
  :domain AminoAcid
  :super PhysicoChemicalProperty)

(defpartition Charge
  [Positive Neutral Negative]
  :domain AminoAcid
  :super PhysicoChemicalProperty)

(defpartition Hydrophobicity
  [Hydrophobic Hydrophilic]
  :domain AminoAcid
  :super PhysicoChemicalProperty)

(defpartition Polarity
  [Polar NonPolar]
  :domain AminoAcid
  :super PhysicoChemicalProperty)

(defpartition SideChainStructure
  [Aromatic Aliphatic]
  :domain AminoAcid
  :super PhysicoChemicalProperty)
Note

Nothing new here except for the Parent class for all amino acid properties

The rest we have defined before: An ontology Classes Value partitions (includes as-facet extra-logical restrictions)

Defining our amino-acid

  • Done this before

  • It involves too much typing

  • Want new syntax

  • Ensure consistency across all class definitions

Note

Building amino acids by defining a class for each is no good at all, as it’s too long. Also, it’s still too risky. So, we are going to expand the syntax, so that it will works better and ensures consistency across all class definitions.

Defining our amino-acid

  • The function is relatively easy

  • defdontfn gives default ontology handling

  • Name of the function amino-acid

  • & properties is variadic or "one or more args".

  • owl-class function does not define a new symbol

(defdontfn amino-acid [o entity & properties]
  (owl-class o entity
     :super
     (facet properties)))
Note

The function for making a new amino acid is relatively simple, as these things go. It just passes off most of it’s work to owl-class. We could also add "AminoAcid" as a superclass here, but I chose to do this later for reasons that should become apparent.

This function does not intern — we can define a new amino-acid like this, but the entity would be a string and we cannot refer to it afterwards as anything other than a string.

We cannot just replace owl-class with defclass to achieve this. The explanation requires knowledge of lisp, but it is this: because amino-acid is a function it’s arguments are evaluated so we cannot pass a bare symbol — Clojure will crash. More defclass is a macro, so it will be called when the amino-acid is evaluated NOT called. We have to make a macro to do this; for more information see Q&As.

Defining lots of amino-acids

  • Let’s define all twenty amino-acid at once

  • Pass definitions as a list (of lists)

  • 1. Call using map

  • 2. The anonymous function destructures

  • 3. ->Named packages name and entity together

(defdontfn amino-acids [o & definitions]
  (map
   (fn [[entity & properties]]
     ;; need the "Named" constructor here
     (->Named entity
              (amino-acid o entity properties)))
   definitions))
Note

Why stop there? Better than creating one amino-acid, let’s create all twenty of them at once.

This function does three things 1. map: calls the amino-acid function over several definitions. 2. destructures (Lisp specific): splits definitions into Entity (first value of definitions) and & properties (rest i.e. all values of definitions) 3. it bundles the return value with the name of the entity (which is the first element of the definition).

And make variables

  • We want to use symbols and define a new variable

  • Tawny-OWL has some support for this

  • Not going to explain in detail

(defmacro defaminoacids
  [& definitions]
  `(tawny.pattern/intern-owl-entities
    (apply amino-acids
     (tawny.util/name-tree ~definitions))))
Note

We want to transform a bunch of symbols into strings because we are going to use symbols we have not defined yet. We are missing the fact that the symbol name and the string are equivalent here, so we could do this better, but the name-tree function is too handy here. intern-owl-entities achieves the same thing as defentity.

Define the amino acids

  • Now we can define all twenty amino acids in one go

  • The syntactic regularity means we are unlikely to miss something

  • This makes the effort worth while

  • We also define subclasses, disjoints and covering

  • Pay attention to the :cover

(as-subclasses
 AminoAcid
 :disjoint :cover

 (defaminoacids
   [Alanine        Neutral  Hydrophobic NonPolar Aliphatic Tiny]
   [Arginine       Positive Hydrophilic Polar    Aliphatic Large]
   [Asparagine     Neutral  Hydrophilic Polar    Aliphatic Small]
   [Aspartate      Negative Hydrophilic Polar    Aliphatic Small]
   [Cysteine       Neutral  Hydrophobic Polar    Aliphatic Small]
   [Glutamate      Negative Hydrophilic Polar    Aliphatic Small]
   [Glutamine      Neutral  Hydrophilic Polar    Aliphatic Large]
   [Glycine        Neutral  Hydrophobic NonPolar Aliphatic Tiny]
   [Histidine      Positive Hydrophilic Polar    Aromatic  Large]
   [Isoleucine     Neutral  Hydrophobic NonPolar Aliphatic Large]
   [Leucine        Neutral  Hydrophobic NonPolar Aliphatic Large]
   [Lysine         Positive Hydrophilic Polar    Aliphatic Large]
   [Methionine     Neutral  Hydrophobic NonPolar Aliphatic Large]
   [Phenylalanine  Neutral  Hydrophobic NonPolar Aromatic  Large]
   [Proline        Neutral  Hydrophobic NonPolar Aliphatic Small]
   [Serine         Neutral  Hydrophilic Polar    Aliphatic Tiny]
   [Threonine      Neutral  Hydrophilic Polar    Aliphatic Tiny]
   [Tryptophan     Neutral  Hydrophobic NonPolar Aromatic  Large]
   [Tyrosine       Neutral  Hydrophobic Polar    Aromatic  Large]
   [Valine         Neutral  Hydrophobic NonPolar Aliphatic Small]))

Defined Classes

  • Next we define defined classes

  • Defined Classes can be reasoned over

  • Anything with a Small facet is a SmallAminoAcid

(defclass SmallAminoAcid
  :equivalent (facet Small))
Note

Defined subclasses are the heart of reasoning in OWL. Effectively, they form queries and they tell us useful things.

And some more

  • There are lots of these

  • We can combine them in many ways

(defclass
  SmallPolarAminoAcid
  :equivalent (owl-and (facet Small Polar)))

(defclass LargeNonPolarAminoAcid
  :equivalent (owl-and (facet Large NonPolar)))
Note

So, having done one, surely we should do more. So here are the next two. Here we are combining two.

Where do we stop?

  • 3? 10?

  • Why not do them all?

  • "Doing them all" actually means the Cartesian product

  • We are using a programmatic tool

  • How would we do this?

Note
  • "Doing them all" actually means the Cartesian product

  • Surprisingly there is not a function for this

  • This is pure Clojure, not doing to describe it

(defn cart [colls]
  (if (empty? colls)
    '(())
    (for [x (first colls)
          more (cart (rest colls))]
      (cons x more))))

A defined class function

  • Similar to before

  • This does not create symbols

(defn amino-acid-def [partition-values]
  (owl-class
   (str
    (clojure.string/join
     (map
      #(.getFragment
        (.getIRI %))
      partition-values))
    "AminoAcid")
   :equivalent (owl-and (facet partition-values))))
Note

We can build a function to replicate this. We use some string manipulation for this to generate the name. I have not done the interning here — I leave this as an exercise!

Doing them all

  • Call the amino-acid-def function on the Cartesian product

  • This creates 431 defined classes

(doall
 (map
  amino-acid-def
  ;; kill the empty list
  (rest
   (map
    #(filter identity %)
    ;; combination of all of them
    (cart
     ;; list of values for each partitions plus nil
     ;; (so we get shorter versions also!)
     (map
      #(cons nil (seq (direct-subclasses %)))
      ;; all our partitions
      (seq (direct-subclasses PhysicoChemicalProperty))))))))
Note

We now call actually run the Cartesian product. We add "nil" so that we get single, double, and triple as well as full length products, and filter for nil to get rid of them again.

Reasoning

  • Finally, we reason over this

  • Here, we choose to use HermiT

  • And check consistency

  • This takes a second or two

(reasoner-factory :hermit)

;; => true
(consistent?)
Note

Tawny supports a couple of reasoners out of the box, including Hermit and ELK. Here we are instantiating using a :keyword, but this is just a short-cut — any OWL API OWLReasonerFactory can be used directly.

The reasoner is invoked to check consistency automatically. Tawny uses a GUI (a progress bar) by default to show this process, but falls back to text if that is not possible (so you can check consistency in a CI environment without hassles.

Reasoning

  • When reasoning, working what happened can be tough

  • Especially when using a "Textual User Interface"

  • But, we can count numbers

  • We have reasoned many subclases of AminoAcid

;; => 20
(count (subclasses AminoAcid))

;; => 451
(count (isubclasses AminoAcid))
Note

Working out what has happened can be quite hard (this is something that we wish to fix in future versions of tawny), but counting subclasses work as well as anything. We now have a lot more inferred subclasses than asserted.

Reasoning

  • While we consistent, we are not coherent

  • In fact, we have many unsatisfiable classes

  • What is happening?

;; => false
(coherent?)

;; => 242
(count (unsatisfiable))

Visualising

  • Many ways to visualise our ontology

  • Saving it and opening in Protege is easiest

(save-ontology "aabuild.owl" :owl)
protege-aabuild.png
Note

Protege is really nice for visualising ontologies. So, we use that here. I always save to the same file name, but better to save as OWL rather than OMN because it parses better.

Visualising

  • Many defined classes are equivalent

  • Many are unsatifisable

  • Happens because there are 20 amino-acids

  • But 431 defined classes

  • Many defined classes have necessarily the same extent

  • Many can have no individuals (Negative and Hydrophobic)

  • Only happens with the covering axiom

protege-unsatisfiable.png
Note

The reason that this happens is obscure, perhaps, but the base reason is because we have many more defined classes than we have primitive ones. So, we must have equivalences or unsatisifiables.

This only happens because of the magic :cover axiom. We know all of the amino acids that exist --- as there are no hydrophobic charged amino-acids (as the two facets are not independent for obvious reasons of chemistry!), that class becomes inconsistent. Without the :cover axiom, the reasoner would assume that this class could exist but we have just not mentioned it.

So, the reasoning is telling us something about the biology — and whether we want this form of conclusion depends on whether we are talking about biology or chemistry — after all if we were a chemist many amino acids could be created that separate out of equivalent classes, and many make some unsatisifiable classes satisfiable.

Life is complex but, in this case, simpler than chemistry.

As a query

  • Using defined classes as a query

  • Not that useful

  • Most of the inferred subclasses are defined

;; => 242
(count
 (isubclasses SmallAminoAcid))

As a query

  • We have a full programming language

  • So, we filter for only undefined classes

;; => 0
(count
 (filter
  #(not (.isDefined % aabuild))
  (isubclasses SmallAminoAcid)))

Task 8: Conclusions

  • We can use highly programmatic nature of Tawny

  • We can generate many defined classes

  • To do so is useful

  • In this case one axiom can have a large effect

  • Results depend on the choices we make in the modelling

Summary

  • Task 1 - Built a hello world ontology

  • Task 2 - Built the amino acid tree (classes only)

  • Task 3 - Introduced facets to define amino acid existential restrictions

  • Task 4 - Introduced patterns e.g. the value partition pattern to define our properties

  • Task 5 - How Tawny-OWL supports OBO Identifiers

  • Task 6 - Using other existing Tawny-OWL ontologies

  • Task 7 - Reading other existing ontologies into Tawny-OWL

  • Task 8 - Extending Tawny-OWL syntax to programmatically generate many amino acid classes and defined classes

Hiatus

  • Hope that I have shown basic and advanced functionality

  • Key feature of Tawny-OWL is extensibility

  • Reuse of existing tooling

  • Would welcome feedback

Questions and Answers

Questions

  • Can I add annotations on axioms?

  • How does this affect ontology deployment?

  • How do you version your ontology?

  • How do you test your ontology?

  • How do you continuously integrate your ontology?

  • What about advanced documentation for ontologies?

  • How do I collaboratively develop my ontology?

  • Can I internationalise my ontology?

  • Can I scaffolding my ontology from existing sources?

  • What happens if the labels of read ontologies change?

  • How do you convert an existing ontology to Tawny-OWL?

  • How fast is Tawny-OWL?

  • Can I integrate more tightly with Protege?

  • How does Tawny-OWL affect dependency management with ontologies?

  • Can I link ontologies into software?

  • What’s this :super? why not :subclass?

  • How do we extend Tawny-OWL, so that it saves the ontology on every change?

  • How do OBO identifiers work?

  • Do I have to conform to the no-use-before-define rule?

  • How do we create a new symbol with the amino-acid function?

Can I add annotations on axioms?

  • OWL allows annotation of axioms, for provenance for example

  • Tawny provides a syntax for this

  • Annotates SubClassOf axiom between Man and Person with a comment.

  • http://www.russet.org.uk/blog/3028

(defclass Man
 :super
 (annotate Person
           (owl-comment "States that every man is a person")))
Note

One of the reasons for the complexity of the OWL API is that it allows annotations to be passed in lots of places, including on the axioms that assert the relationship between, for example, two classes. One simplification I made with Tawny is to hide this complexity. Moreover, Tawny is frame-centric so the axioms are not normally seen explicitly.

Unfortunately, it appears that in hiding the complexity, I had also hidden a capability that people actually use: GO uses axiom annotations for provenance for instance. So I have added this capability into tawny with a slightly expanded syntax.

The annotate function returns an object encapsulating and OWL API entity and an annotation that will be added to the axiom of the frame. We can see this in the example given. Note that we are annotating neither Person nor Man in this case, but the relationship between them.

How does this affect ontology deployment

  • Potentially none — Tawny generates an OWL file

  • Potentially automatable

    • In project source, or through leiningen plugin

  • Or can publish as a maven artefact

    • Ontology can be downloaded as a software artefact

    • Separates out ontology identifier and download location

Note

We can deploy ontologies to bioportal or to anywhere else exactly as we do now. Save the OWL file, and do stuff to it manually.

Of course, in a programmable environment, it is also very easy to add additional deployment technology — so generate one or more files, copy them to another location, via ssh or http, check them in somewhere. This can be achieved either in the project source or as a leiningen plugin.

Or, finally, we can deploy our ontology like any other piece of Clojure code, as a maven artefact, to maven central, to clojars (like maven central but for Clojure), or to our own private repo. This also separates our the ontology identifier from the download location. It’s possible to argue whether this is a good thing or a bad thing.

How do you version your ontology?

  • Tawny-OWL uses a line-orientated syntax

  • You edit source code not a visualisation of an XML file

  • Like various OBO flat file syntaxes, it works well in git or any VCS

  • Leiningen supports release versioning, using Semantic Versioning

Note

Tawny-OWL uses a line-orientated syntax and what you edit is source code, not some XML that has been generated. So, it works very well with version control. The serialisation order is entirely predictable because there is no serialisation — it’s source code, it only changes if you edit it.

Working with source code also means that diff tools show you changes in the same form of the code that you edit, so it’s very easy to compare two versions, two branches what ever. The only exception to this are the EDN files used for numeric IDs of OBO identifiers. These have been designed to version well (they are generated but not regeneratable, so should be regarded a source code), but only time will tell. "They do not version well" will be considered to be a bug though, and will be something I would want to fix.

We’ve used git for all the various ontologies we have developed, and it works nicely. Actually "nicely" is an understatement. The move to modern version management, rather than anything bespoke build just for ontologies makes an enormous difference. For my money, it’s reason to move to tawny all by itself.

It is also possible to version in the sense of release tagged versions of an ontology and to use a dependency mechanism.

How do you test your ontology

Note

We test our ontologies explicitly and sometimes very heavily. Tawny-OWL provides some fixtures to make this easy and we use core.test but there other test frameworks and we might move at some point. This has been written up as a paper, and I would suggest reading these for further information.

How do you continuously integrate your ontology?

  • We can test ontologies with standard frameworks

  • These can run directly from leiningen

  • This workflow allows the use of standard CI environment with no changes

  • We use github/travis-CI

  • Note, iff you import ontologies via URI, you may not get a repeatable build.

Note

Once you can test, then you can continuous integrate. We do this with Travis-CI which is nice and it supports Clojure and Leiningen out of the box (er, cloud). It also continuously integrates with the software environment (including tawny). You can reason on there as well (tawny works headless just fine).

If you import ontologies via URI you are totally dependent on them being stable or you will not get a repeatable build. It’s probably better to import ontologies via their version IRI anyway to ensure this, although it’s not widely done.

What about advanced documentation for Ontologies?

  • Tawny-OWL ontologies are readable text

  • It is possible to embed rich readable comments

  • Also can use literate programming tools

  • noweb, or org-mode use traditional approach

  • I have also developed "lentic" which integrates with editor

Note

OWL raw does allow documentation but it’s poor. With annotation properties you have no structure at all unless you use microsyntax. Moreover most annotations are sets — so no order.

We have been experimenting heavily with literate programming tools. I started this quite a few years back, but they now work well, and we have built a specialised tool called "lentic".

How do I collaborative develop by ontology?

  • The same was as all software

  • Version control for asynchronous, fork and merge with git

  • Collaborative chat use gerrit, or skype

  • Synchronous editing, try floobits, a web editor

Note

Collaborative development is not a new requirement and is, in fact, the default for some environments. Just use existing tools. Git if you want asynchronous development, or floobits, or even a virtual machine, tmux and Emacs in the console. What ever.

The point is, it’s not a problem for tawny. It’s a problem for many software engineers world-wide and they have provided some very, very slick solutions.

Can I internationalise my ontology?

  • Can add internationalised labels

(label "Ciao" "it")
  • Can define internationalised function calls

(defn etichetta [l]
  (label l "it"))
  • Can use tawny.polyglot to use property bundles

Note

It’s very easy. Tawny programmability means that you can also support a default language if you chose to, and have your ontologists use their own native language for all parts of the system. Our example of etichetta above is one of the complicated examples — tawny has to know the default language also. In most cases, we could just define aliases (see tawny.english for examples). tawny.polyglot uses property bundles which I believe integrate well with most machine-supported translation environments.

Can I scaffold my ontology from existing source

What happens if the labels of read ontologies change

  • OBO ontologies use numeric IDs

  • These are unreadable, so we syntactically transform labels

  • If label changes (but ID remains the same) is a problem

  • Can use tawny.memorize to remember mappings

  • Which adds aliases to those now missing (with optional "deprecated" warnings)

Note

The problem here is that we have to do something to get readable names for OBO style ontologies. But we are now using a part of the OBO style ontology that is open to change with, perhaps, fewer guarantees than for identifiers.

Tawny has support for this. It’s solves the problem by saving the mapping that it creates between a label and a URI. If the URI remains, but the label disappears than tawny adds an alias and deprecation warning.

How do you convert an existing ontology to Tawny-OWL

  • tawny.render can perform a syntactic transformation

  • Given OWL provides equivalent Clojure code

  • Used interactively to provide documentation

  • Can be used to port an ontology

  • Currently "patternising" ontology is manual

  • See Jennifer Warrender’s PhD thesis where she did this with SIO

Note

We do have a methodology for doing this. We can render most of the ontology automatically, which provides the basis for this kind of port. But actually making use of the advanced features of tawny (like patterns or the as-subclasses functionality) is manual. In general, at least some of this will be necessary although it can be done manually.

How Fast is Tawny

  • For raw, un-patternised ontology tawny takes about 2x as reading OWL/XML

  • Tested by rendering and load GO

  • About 56Mb of lisp

  • Loads in about 1min

  • Most of excess time is in parsing, (Clojure also compiles)

  • Patternised ontology would involve less parsing

Note

In short, it’s fast enough that you are probably never going to notice it. We cannot currently test how much difference the patternisation would make, but it might be substantial.

It is also worth noting that with an ontology the size of GO, iff it were developed in Tawny, it would be unlikely to be single file. Interactively (i.e. in the editor) you probably would not be loading the whole ontology most of the time anyway.

Can I integrate more tightly with Protege?

  • We have built a GUI shell into Protege

  • Can also use Protege to open a Clojure REPL via a socket

  • Protege then displays directly the state of Tawny

  • Good for demonstration

  • But a little flaky for normal use

  • Having Protege reload an OWL file easier

Note

This should work nicely and it does, but the truth is that at the moment a REPL opened inside Protege hangs periodically and I do not know exactly why; I suspect it is that Protege is not entirely happy with having it’s data structures changed underneath it, but I have not had the leisure to debug this yet.

In our hands, the auto-reload function works well. I tend to render first to OMN and look at that. Tawny also has a documentation capability which shows you the "unwound" definition of terms. And then finally I use Protege after that.

How does Tawny affect dependency management with ontologies?

  • Clojure uses maven dependency management

  • We can now publish ontologies as maven artefacts

  • And specify dependencies, with versions, and tooling

  • Can publish on Maven central or Clojars (no infrastructure to maintain!)

  • Separates ID and download location — disobeys LOD principles

  • But fulfils, SLOD principles.

Note

Clojure uses maven dependency management. As a tawny ontology is just a piece of Clojure, we can use the same mechanism with tawny ontologies also. Which means that we can specify ontological dependencies also. This means we can specify version ranges (OWL doesn’t allow this to my knowledge). And we can reuse tooling. We can use Leiningen to show us a dependency graph, we can look for version conflicts, and we can exclude duplicates from the transitive closure.

Interestingly, we can also publish our ontology independently from our IDs. So, we can get someone else to maintain all the infrastructure for deployment (including of multiple versions) without having to adopt their identifiers (like bioportal).

This rather breaks the Linked Open Data (LOD) principles, of course which says that IDs should resolve. Using maven dependencies we don’t need this at all. But it fulls the SLOD (significant load of dependencies) principle which says if your software has lots of dependencies and lots of different people maintaining the infrastructure for their availability it is going to break all the time.

Thanks to Helen Parkinson for inspiring (a slightly different version) of the SLOD acronym.

  • OWL API objects become first class entities in Clojure

  • Can refer to them directly

  • We integrated Overtone — a music generation system

  • Added in Tawny-OWL and the Music Ontology

  • We now have software that plays a tune

  • And provides OWL metadata about that tune

  • More to investigate here.

Note

One of the great unexplored areas of Tawny at the moment is how much value we can get embedding an ontology into software. We did have a very short project integrating a semantic system (tawny) with a music generation system. This works and was fun. I think there is a lot of scope for research in this area yet.

What’s this :super? why not :subclass?

  • Manchester syntax uses SubClassOf:

  • Tawny uses :super for the same purpose!

  • Confusing!

  • Manchester syntax is actually backward

  • In tawny, all frames are A has :frame B

  • In Manchester A is a SubClassOf: B

  • http://www.russet.org.uk/blog/2985

Note

I made this change very carefully and was very reticent about it: not least because it made my main user of Tawny at the time (Jennifer Warrender) change all of their existing code. But I had really confusing code inside Tawny where my add-subclass functions had backward semantics to all the others.

I changed from ":subclass" to a more plain ":super" at the same time. This opens up a slight risk because object-property has the same frame but for a different purpose. I do not worry about this too much because other tools will pick up, for example, the use of a class as a super-property.

How do OBO identifiers work?

  • The mapping is stored in a file

  • The obo-restore-iri line above reads this file

  • If a symbol has no mapping, we use the "pre-iri" form.

How OBO Identifiers work

  • The mapping file is generated

  • Human readable, and line-orientated

  • Deterministically ordered

  • Will version!

  • Uses EDN format.

("ro"
 "http://purl.obolibrary.org/obo/EXAM_000001"
 "G"
 "http://purl.obolibrary.org/obo/EXAM_000002"
 "F"
 "http://purl.obolibrary.org/obo/EXAM_000003")
Note

The mapping file has been designed to work with version control, because it needs to be shared between all developers. Although it is a generated file, it is source code, since it cannot be recreated from fresh (not in the same order anyway).

EDN format is a Clojure thing. It’s basically a Clojure read syntax.

How OBO Identifiers work

  • Stable pre-iri’s

  • No need for a server such as URIgen

this stores any new IDs we have created
(comment
  (tawny.obo/obo-store-iri obo "./src/tawny/tutorial/whats_in_a_name.edn"))
Note

While the preiris are automatically created, if we choose they can be made stable by simply saving them into the file with the form above. This can be safely done every time the file is evaluated, because the order is deterministic, so it will cause no false diffs in versioning.

This is potentially useful if you are collaborating with others and want to co-ordinate at pre-release time. It’s not essential if others are using tawny — there is no need, since classes can be referred to by symbol.

There is a potential disadvantage. This creates an IRI (and entry in the file) for every new entity created. Not a problem with Protege, but tawny is fully programmatic. I can create 10^6 new classes in one line of code.

pre-iris all appear at the end of the EDN file!

How OBO Identifiers work

  • How to create permanent IDs

  • Needs to be co-ordinated, since IDs are incremental

  • Use version-control to co-ordinate

  • One person, or as part of a release process

this coins permanent IDS, in a controlled process!
(comment
  (tawny.obo/obo-generate-permanent-iri
   "./src/tawny/tutorial/whats_in_a_name.edn"
   "http://purl.obolibrary.org/obo/EXAM_"))
Note

At some point, you need to coin new IDs that will become permanent. This has to happen in a co-ordinated fashion. It could be done as part of release. Or by bot during continuous integration.

How OBO Identifiers work

  • I think having no server is nice

  • Reusing version control makes sense

  • It’s programmatic! You are free to disagree.

Note

How well would this workflow work in practice? Not sure. It would work for a small number of developers. There are many tweaks that could be made for different scales — saving to multiple files, pre-iris in one place, perms in another. No pre-iris at all. Use URIGen. Manually coin permanent IRIs as part of quality control.

It’s programmatic and easy to change.

Do I have to conform to the no-use-before-define rule?

  • It is possible not to use symbols

  • The iri-gen function takes a string not a symbol!

  • This string is the tawny name

  • Consider the following

String building!
(defontology s)

(owl-class "J" :ontology s)
(object-property "r" :ontology s)
(owl-class "K"
           :ontology s
           :super (owl-some "r" "J"))
Note

We can do without symbols and instead use just strings. Note that we also have to switch functions owl-class instead of defclass.

What are Tawny Names?

  • These forms do NOT define symbols

  • This WILL NOT work

  • Neither r nor J have been defined

(comment
  (owl-class "L"
             :ontology s
             :super (owl-some r J)))

Tawny Names

  • Danger!

  • Consider this statement.

  • But we did not define "L"

  • But we have used it.

(owl-class "M"
           :ontology s
           :super "L")
  • And so, it becomes defined

Class: s:M
    SubClassOf:
        s:L

Class: s:L
Note

The use of strings means that we can define things without the tawny-name being in "primary position". It just happens. You need to be careful.

Why use strings?

  • Partly, there for implementation

  • But made public as string manipulation is easier

  • Most useful for development

Note

The main reason that I have left this in place is for use as an API. Clojure allows full manipulation of symbols like most lisps, but it’s a bit of a pain. It’s not possible, for example, to concatenate two symbols to make a longer one (or rather, they need to be converted to strings, then concat’d then converted back again). And the creation and interning of new symbols as variables requires the use of macros rather than normal functions.

Having said that, tawny does offer some facilities to help with this process

  • tawny.owl/intern-owl-entity

  • tawny.owl/intern-owl-string

  • tawny.owl/defentity

  • tawny.owl/intern-owl-entities

  • tawny.util/quote-word

  • tawny.util/name-tree

We will see a few of these later.

How do we create a new symbol with the amino-acid syntax?

  • If we want to create a new symbol Tawny-OWL provides defentity

  • It does a few other things as well

(defentity defaminoacid
  "Defines a new amino acid."
  'amino-acid)
Note

In fact, defclass, defoproperty and the rest are all defined in this way. This is a common enough thing to want to do, that I made this macro public. It is an easy way, for instance, to add new frames or set default values for existing frames, or to define patterns.

defentity does one or two other things as well, chiefly adding metadata to the var created. If you don’t know what this means (unless you know Clojure you probably wont) then it really isn’t important.

Programming an Autosave

  • Extend Tawny, so that it saves the ontology on every change

Extending Tawny

  • Tawny is implemented directly in Clojure

  • We can extend in the same syntax

  • Can extend in general and ontology specific ways

Note

I’ve picked an autosave because it is a nice general function that we might want to use and demonstrates some of the possibilities. But ontology specific is perhaps most useful of all, because it allows ontology development groups to tailor Tawny to their own development practices without having to generate bespoke extensions for Protege or equivalent.

Namespace

  • The namespace definition here is a bit different

  • Require tawny.owl through an alias

  • Also import an OWL API interfaces

(ns tawny.tutorial.autosave
  (:require [tawny.owl :as o])
  (:import [org.semanticweb.owlapi.model.OWLOntologyChangeListener]))
Note

In general the use of :use clauses is falling out of favour in clojure circles; it may disappear in future versions (although it will be replaced with something else somewhat more verbose). The reason for this is that :use is a hostage to the future. If I use a namespace and it gains a new function, we could end up with a name collison. So, if clojure.core adds an only function, it would require changes to any namespace which use'd tawny.owl

For my own use, I think that with tawny this risk is worth it (and it is easy to fix if it happens). For ontology namespaces (i.e. those in which I define an ontology not much else), I tend use tawny.owl. For namespaces in which I want to extend tawny.owl, I tend to require tawny.owl normally aliased to o.

The name collison between import and owl-import is sort of an example of this problem.

Saving the Listener

  • We need a variable in which to save our listener

  • Clojure variables are immutable

  • Stores a atom and change that

(def auto-save-listener
  "The current listener for handling auto-saves or nil." (atom nil))
Note

Clojure was designed for concurrency and generally does not allow changing variables (although it does allow re-evaluation). Instead it has a set of objects which can store any value and which can be changed but which require the developer to make an explicit choice about how to deal with concurrent changes.

It’s very nice and very sensible, but largely we just ignore it here. The reality with tawny is that the it’s build on the OWL API which is mutable, is not thread-safe and is not build for concurrency.

The auto-save function

  • OWL API has an listener for ontology changes

  • We address it directly here.

(defn auto-save
  "Autosave the current ontology everytime any change happens."
  ([o filename format]
   (let [listener (proxy [org.semanticweb.owlapi.model.OWLOntologyChangeListener]
               []
             (ontologiesChanged[l]
               (o/save-ontology o filename format)))]
     (reset! auto-save-listener listener)
     (.addOntologyChangeListener
      (o/owl-ontology-manager) listener)
     listener)))

auto-save in detail

  • Instantiate an object which implements OWLOntologyChangeListener

  • Implement a single method of this

  • Just save the ontology

  • proxy is a closure

  • o, filename, format are closed over

(proxy [org.semanticweb.owlapi.model.OWLOntologyChangeListener]
       []
   (ontologiesChanged[l]
      (o/save-ontology o filename format)))
Note

This is our actual listener. Proxy objects are available in the Java core, and the Clojure ones do much the same thing — implement an interface on the fly. They are rather more common in Clojure because a) it makes more sense in Clojure and b) implementing a new class is a bit of a pain (although entirely possible).

Proxy objects work through reflection and have the performance characteristics that you would expect — but this is meant to be an end user function, so we really don’t care. Saving the ontology is likely to take far more effort than a reflective call.

auto-save in detail

  • save the listener

  • discarding any existing one!

   (reset! auto-save-listener listener)
Note

Clojure follows the scheme convention of marking mutating functions with a !. Tawny doesn’t because we’d add ! to everything. Generally speaking reset! is considered rather bad form — Clojure programmers only change an atom by applying a function to the existing value.

Dropping the existing value is, of course, wrong and a memory leak, since the existing listener will be held by the manager, and may even result in strange behaviour.

It’s a demo!

auto-save in detail

  • The o/owl-ontology-manager function returns an OWLOntologyManager

  • Clojure uses the . syntax to call methods

  • So, call addOntologyChangeListener on the manager

  • With listener as an argument

(.addOntologyChangeListener
     (o/owl-ontology-manager) listener)
Note

I’ve used Clojure’s java interaction extensively during the development of the Tawny and I have to say I found it to be very nice. It fits very comfortably with normal Clojure development.

In general, I hide the existance of the OWLOntologyManager, and the OWLOntologyDataFactory. Generally, there is only one. There are more flexible approaches, of course, but I have had to make decisions in the development of tawny. Otherwise, pretty much every function would require a Manager, or Factory and an ontology as a argument and tawny, as it stands would have become unusable.

You might be wondering why owl-ontology-manager is a function call and not just a variable. Actually, it is because of the latter — I have so far found one occasion where I want multiple managers which is for protege integration. Turning this into a function allowed me to monkey-patch tawny for this purpose.

It would be possible to deal with this much more formally, and pass an environment, probably integrated with the "default-ontology" functionality of Tawny. But I have not found a strong use case for this yet.

Remove the auto-save

  • And a function to reverse the process

  • @ dereferences the atom

(defn auto-save-off
  "Stop autosaving ontologies."
  []
  (when @auto-save-listener
    (.removeOntologyChangeListener
     (o/owl-ontology-manager)
     @auto-save-listener)))
Note

And finish off. If you don’t know what @ and dereferencing mean, it’s really not interesting.

auto-save

  • There is already an auto-save function in tawny.repl

  • And an on-change function

Conclusions

  • Clojure provides easy interop with Java

  • We can use this to extend Tawny-OWL capabilities

Note

So, this has been a very rapid run through of how to integrate directly with the OWL API and add new functionality to Tawny that cannot be achieved through tawny itself.

This is very commonly done in Clojure (where the mantra is do not wrap if you do not need to) and so it is well supported.

Conclusions

  • Hope the tutorial was worthwhile

  • Tawny-OWL can change the way you build ontologies

  • Will actively support use

  • Always interested in future collaborations

Lentic talk

  • Highly Literate Approach to Ontology Building

  • New lenticular environment for building and documenting ontologies

  • Literate programming principles

  • Tuesday 7th December, 2015 @ 15:50

UKON 2016 (#ukon2016)

  • The 5th UK Ontology Network Meeting

  • Newcastle University

  • Thursday 16th April, 2016

Acknowledgements

  • Phillip Lord, Tawny-OWL

  • Jennifer Warrender, Karyotype Ontology, Tawny-OWL

  • Anthony Moorman, Driving use case

  • Ignazzio Palmizziano, OWL API Support

  • Matt Horridge, Protege, OWL API Support

  • Robert Stevens, Amino Acid Ontology