
We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. This application is an exciting challenge that should appeal to and benefit from several research communities, most notably, the database, text analytics and distributed system worlds. E.g., to provide fast answers for such queries, cloud computing techniques need to be incorporated with text analytics, data cleansing, query processing and query refinement methods. However, the envisioned path for OLAP-style query processing over textual web data may take a long time to mature. Two recent developments have the potential to become key components of such an ad-hoc analysis platform: significant improvements in cloud computing query languages and advances in self-supervised information extraction techniques. In this talk, I will give an informative and practical look at the underlying research challenges in supporting "Web-Scale Business Analytics" applications with a focus on its key components and will highlight recent projects.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
