Semantic inferencing and querying across largescale
RDF triple stores is notoriously slow. Our objective
is to expedite this process by employing Google’s
MapReduce framework to implement scale-out distributed
querying and reasoning. This approach requires
RDF graphs to be decomposed into smaller units that
are distributed across computational nodes. RDF Molecules
appear to offer an ideal approach – providing
an intermediate level of granularity between RDF
graphs and triples. However, the original RDF molecule
definition has inherent limitations that will adversely
affect performance. In this paper, we propose a
number of extensions to RDF molecules (hierarchy and
ordering) to overcome these limitations. We then
present some implementation details for our MapReduce-
based RDF molecule store. Finally we evaluate
the benefits of our approach in the context of the BioMANTA
project – an application that requires integration
and querying across large-scale protein-protein
interaction datasets.
