
Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional database system, cohort analysis queries are both painful to specify and expensive to evaluate. We propose to extend database systems to support cohort analysis. We do so by extending SQL with three new operators. We devise three different evaluation schemes for cohort query processing. Two of them adopt a non-intrusive approach. The third approach employs a columnar based evaluation scheme with optimizations specifically designed for cohort query processing. Our experimental results confirm the performance benefits of our proposed columnar database system, compared against the two non-intrusive approaches that implement cohort queries on top of regular relational databases.
FOS: Computer and information sciences, Computer Science - Databases, Databases (cs.DB)
FOS: Computer and information sciences, Computer Science - Databases, Databases (cs.DB)
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 18 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
