descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Sep 2016Embargo end date: 01 Jan 2016 English Publisher:Association for Computing Machinery (ACM)Journal:Proceedings of the VLDB Endowment, volume 10, pages 1-12 (issn: 2150-8097,

Authors: Beng Chin Ooi; Gang Chen; H. V. Jagadish; Kian-Lee Tan; Dawei Jiang; Qingchao Cai; Anthony K. H. Tung;

doi: 10.14778/3015270.3015271 , 10.48550/arxiv.1601.00182

arXiv: http://arxiv.org/abs/1601.00182

Cohort query processing

- Summary
- Subjects
- Metrics

Abstract

Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional database system, cohort analysis queries are both painful to specify and expensive to evaluate. We propose to extend database systems to support cohort analysis. We do so by extending SQL with three new operators. We devise three different evaluation schemes for cohort query processing. Two of them adopt a non-intrusive approach. The third approach employs a columnar based evaluation scheme with optimizations specifically designed for cohort query processing. Our experimental results confirm the performance benefits of our proposed columnar database system, compared against the two non-intrusive approaches that implement cohort queries on top of regular relational databases.

Related Organizations

University of Michigan–Flint
United States
Zhejiang Ocean University
China (People's Republic of)
National University of Singapore
Singapore
ZHEJIANG UNIVERSITY
Zhejiang University

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Databases, Databases (cs.DB)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	18
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Green

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering