The University of Queensland Homepage
School of ITEE ITEE Main Website

 Project Background

Design for a Scalable Ontology Server

Supervisor: Assoc. Prof. Bob Colomb

The Semantic Web, RDF, and OWL

The World Wide Web contains many useful documents, but finding specific information can be very difficult. The current state of the art is based on keyword searches, an approach which often finds many documents unrelated to the desired topic. In response to this, Sir Tim Berners Lee (the original designer of the web) has proposed a new web, called the Semantic Web. The purpose of the Semantic Web is to provide semantics, or meaning, for information. This will make it possible to find data based on what the user is searching for, rather than specific words.

An important feature of the Semantic Web is the interoperability of data. By putting all data in a standard format it will be possible to infer the meaning of that data. With all data structured in this way, it will be possible for programs to automatically interact with little human intervention. A good example of the lack of this is when a user has to enter a time and date into a calendar program after being invited to a meeting on email. The standard format for data on the Semantic Web is RDF.

The Web Ontology Language, or OWL1, provides a framework for describing how RDF data relates to other RDF data. OWL itself is described using RDF. Using OWL to describe data in this way, it is possible to infer new relationships within that data which were not previously present. A trivial example might be to infer that J. Smith and John Smith are the same person, if they both have the same email address.

The purpose of this project is to create a scalable OWL ontology server.

RDF and Scalability

At the moment, the most scalable RDF database available is the Kowari database. It has been used to store over 250 million statements, while competitive systems have yet to exceed 20 million statements. Scalability is very important to handle the quantity of data available on the Semantic Web, so Kowari has an important role here.

One of the reasons for Kowari's scalability is that its interfaces are based on a "set at a time" approach to managing data. This conflicts with many other systems which manage their data one tuple at a time.

A significant feature missing from Kowari is the ability to perform inferences based on an OWL framework. Any such system would need to work in closely with Kowari, maintaining the scalability inherent in the interfaces.

OWL and Logic

OWL describes a system which is equivalent to a type of predicate logic. It is a superset of first order logic (FOL), while falling short of being a complete Second Order Logic. This type of system is typically referred to as a 1.5 Order Logic. To fall within the confines of a FOL, there is a subset of OWL known as OWL DL, which has been designed to be calculable by existing logic systems.

There are many logic inferencing systems available today which can work with some subset of OWL (such as OWL DL). These include various Prolog implementations, FaCT++, Vampire, and RACE, to name a few. While it would be possible to attach one of these systems to Kowari, they all use a tuples-at-a-time approach. They also use main memory or their own disk structures for handling data, both of which cause scalability problems. This makes these systems unsuitable for use with Kowari.

It has been shown that tuple-at-a-time operations are just as efficient as Prolog. This indicates that there are no inherent disadvantages to tuple-at-a-time calculations. Since Kowari requires tuple-at-a-time implementation for efficient operations, this indicates that it is possible to efficiently calculate OWL inferences for Kowari.

OWL Full and Rules

While predicate logic encompases most OWL operations, there are still a few operations which are missing. The most notable among these are the full cardinality constraints possible in the complete OWL specification (known as OWL Full). Also, the numerous FOL and Prolog systems available cannot handle many of the predicate statements possible in OWL Full, as these are second order logic operations. As a result, the only way to implement the complete set of calculations required for OWL Full would be with a system which performs some or all of its calculations without the use of predicate logic. The obvious candidate for this is a Rules system.

Rules systems are very flexible, and certainly allow for all possible OWL Full operations. There have been proofs indicating that certain systems (such as Rete) are maximally efficient from a theoretical perspective (there are numerous efficiencies which can be applied in practical implementations), but these systems are all designed around tuple-at-a-time operations. It is therefore necessary to consider a Rules implementation which is based on set-at-a-time operations.

The goal of this project is to perform OWL inferencing in a scalable manner. Since OWL is defined in, and describes RDF, the project must work with a scalable RDF server, leading to the choice of Kowari. The implementation of this design will support OWL Full using a Rules engine which can manage data in a set-at-a-time manner.

 

1. See A. A. Milne's Winnie the Pooh for an explanation of this acronym.