Downloads provided by UsageCounts
Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.
Preprint submitted to Information Systems. 35 pages
FOS: Computer and information sciences, Stream data, Informatique générale, Evolution, :Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC], Semi-structured data, Macrodades, Modeling, Informatique appliquée logiciel, Databases (cs.DB), Big data, Web semàtica, Ontologies (Information retrieval), Computer Science - Databases, Ontologies (Informàtica), Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Data integration, Semantic web
FOS: Computer and information sciences, Stream data, Informatique générale, Evolution, :Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC], Semi-structured data, Macrodades, Modeling, Informatique appliquée logiciel, Databases (cs.DB), Big data, Web semàtica, Ontologies (Information retrieval), Computer Science - Databases, Ontologies (Informàtica), Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Data integration, Semantic web
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 51 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
| views | 110 | |
| downloads | 109 |

Views provided by UsageCounts
Downloads provided by UsageCounts