The University of Queensland Homepage
School of ITEE ITEE Main Website

 Seminar: Building an Enterprise Scale Database for RDF Data
Seminar Information

Building an Enterprise Scale Database for RDF Data

Speaker: Andrae Muys, ITEE

When: 2007-02-20 14:00:00

Venue: 78-420

Host: David Carrington

Abstract:

The large scale management of semistructured data is a problem of
increasing relevance. A number of otherwise unrelated fields are
facing an explosion in the amount of information being generated and
requiring management. These include such diverse areas as genomics
and biotech, knowledge representation, citation management, network
traffic analysis, as well as traditional heterogeneous database and
enterprise information integration.

The Resource Description Framework (RDF) is a suite of technology
standards produced by the W3C that was originally designed to
support internet metadata, primarily in the guise of the Semantic
Web. However at the core of RDF is a datamodel that promises to
provide an approach to solving the general problem of managing
semistructured data.

The Mulgara Project (http://www.mulgara.org) is an OpenSource
database implementing this datamodel. Its primary focus has been on
the application of RDF to large scale semistructured data
management. With the ability to scale to 1 billion statements, its
current version is amongst the best scaling implementations
available. We propose to investigate proposed modifications to
Mulgara's storage layer to support two orders of magnitude increase
in its scalability to approximately 100 billion statements. If
successful this would help alleviate many of the data management
problems mentioned above. The paper first provides a formal
definition of semistructured data in terms of vocabulary and
semantics; specifically the underlying assumptions of the relational
model that complicate the management of semistructured data. We then
examine how these assumptions interact with relational
normalisation, and how this provides a rationale for RDF as a model
for semistructured data. After introducing the current design and
functionality of Mulgara, the we then introduce the design of a new
store layer based on a combination of functional programming
techniques and traditional approaches to efficient external memory
datastructures. This combination is shown to provide substantially
simplified implementations of critical features required for large
scale datamanagement including: Lockfree multiversion concurrency;
Live backup and restore; Federation; and Replication.

Biography:

(biography unavailable)

Type: MPhil confirmation

Contact:

David Carrington, seminar host (davec@itee.uq.edu.au)
or Guido Governatori (ITEE seminar co-ordinator)
(guido@itee.uq.edu.au)