Paper 2

Incremental Schema Generation for Large and Evolving RDF Sources

Authors: Redouane Bouhamoum, Zoubida Kedad, Stéphane Lopes

Volume 51 (2022) Special Edition

Abstract

The lack of a descriptive schema for an RDF dataset has motivated several research works addressing the problem of automatic schema discovery. The goal of these approaches is to provide the underlying structural schema of a given RDF dataset, either from the existing instances, or using some schema-related declarations if provided. However, as the instances in the RDF dataset evolve, the generated schema may become inconsistent with the dataset. It is therefore necessary to incrementally update the existing schema according to the changes occurring in the dataset over time. In this paper, we propose a schema discovery approach for massive RDF datasets which incrementally deals with both the insertion and the deletion of entities. It is based on a scalable and incremental density-based clustering algorithm which propagates the changes occurring in the dataset into the clusters corresponding to the classes of the schema. Our approach is implemented using big data technologies to scale-up to massive data, while providing a high quality clustering result. We present some experiments which demonstrate the efficiency of our proposal on both synthetic and real datasets.

Keywords: Incremental schema discovery, Schema evolution, RDF data, Big data, Clustering.