LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 16 Oct 2020Embargo end date: 01 Jan 2019 Qatar Publisher:Frontiers Media SAJournal:Frontiers in Big Data, volume 3 (eissn: 2624-909X,

Copyright policy )Funded by:NSF | III: Small: Native Compil..., NSF | III: Small: In-memory, Di...

Authors: Mingjie Tang; Yongyang Yu; Ahmed R. Mahmood; Qutaibah M. Malluhi; Mourad Ouzzani; Walid G. Aref;

doi: 10.3389/fdata.2020.00030 , 10.48550/arxiv.1907.03736

pmid: 33693403

pmc: PMC7931877

arXiv: 1907.03736

handle: 10576/56741

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew, which is common in practice, and optimize communication costs accordingly. We propose a distributed query scheduler that use a new cost model to optimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler employs new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. The experimental study is based on real datasets and demonstrates that distributed spatial query processing can be enhanced by up to an order of magnitude over existing in-memory and distributed spatial systems.

Country

Qatar

Related Organizations

Chinese Academy of Sciences
China (People's Republic of)
Purdue University System
United States
Purdue University Northwest
United States
George Mason University
United States
Chinese Academy of Science (中国科学院)
China (People's Republic of)

View all View all

Keywords

Big Data, FOS: Computer and information sciences, parallel computing, query processing, in-memory computation, Databases (cs.DB), Information technology, T58.5-58.64, Computer Science - Databases, spatial data, query optimization

3 Research products, page 1 of 1

LocationSpark software on GitHub
IsRelatedTo
Breeze software on GitHub
IsRelatedTo
MAGELLAN software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	18
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%