
doi: 10.1007/11687238_50
Database Management Systems (DBMS) perform query plan selection by mathematically modeling the execution cost of candidate execution plans and choosing the cheapest query execution plan (QEP) according to that cost model. The cost model requires accurate estimates of the sizes of intermediate results of all steps in the QEP. Outdated or incomplete statistics, parameter markers and complex skewed data frequently cause the selection of a suboptimal query plan, which in turn results in bad query performance. Federated queries are regular relational queries accessing data on one or more remote relational or non-relational data sources, possibly combining them with tables stored in the federated DBMS server. Their execution is typically divided between the federated server and the remote data sources. Outdated and incomplete statistics have a bigger impact on federated DBMS than on regular DBMS, as maintenance of federated statistics is unequally more complicated and expensive than the maintenance of the local statistics; consequently bad performance commonly occurs for federated queries due to the selection of a suboptimal query plan. We present an extension of the mid-query reoptimization technique "Progressive Query Optimization" (POP), which adds robustness to query processing by dynamically detecting if an access plan is suboptimal and by triggering a reoptimization in that case. Our extensions enable efficient reoptimization of federated queries. Our contributions are (a) an opportunistic, but risk controlled, reoptimization technique for federated DBMS (b) a technique for multiple reoptimizations during federated query processing, with a strategy to discover redundant and eliminate partial results and (c) a mechanism to eagerly procure statistics in a federated environment. We have implemented these techniques in a prototype version of WebSphere Information Integrator for DB2. Our enhancements enable robust and acceptable performance for federated queries, even if the remote data sources provided almost no statistical information about the data. An extensive case study on real world data shows POP has negligible runtime overhead and improves the performance of complex federated queries by up to a full order of magnitude.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
