On Aggregation Bias in Sponsored Search Data: Existence and Implications

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2011 English Publisher:Elsevier BVJournal:SSRN Electronic Journal (eissn: 1556-5068,

Copyright policy )

Authors: Vibhanshu Abhishek; Kartik Hosanagar; Peter S. Fader;

doi: 10.2139/ssrn.1490169 , 10.1145/2229012.2229014

On Aggregation Bias in Sponsored Search Data: Existence and Implications

- Summary
- Metrics

Abstract

There has been significant recent interest in studying consumer behavior in sponsored search advertising (SSA). Researchers have typically used daily data from search engines containing measures such as average bid, average ad position, total impressions, clicks and cost for each keyword in the advertiser's campaign. A variety of random utility models have been estimated using such data and the results have helped researchers explore the factors that drive consumer click and conversion propensities. However, virtually every analysis of this kind has ignored the intra-day variation in ad position. We show that estimating random utility models on aggregated (daily) data without accounting for this variation will lead to systematically biased estimates -- specifically, the impact of ad position on click-through rate (CTR) is attenuated and the predicted CTR is higher than the actual CTR. First, we prove that the average daily position of an ad is less in convex order than the actual position of the ad for an impression. Using this result, we analytically demonstrate the existence of the aggregation bias. Second, using a large disaggregate dataset from a major search engine containing 8 million impressions, we empirically validate our findings for both the traditional logit model and the Hierarchical Bayesian models that are commonly used in the SSA literature. Third, we build a game-theoretic model to analyze the effect of the bias on the equilibrium of the SSA auction.We find that advertisers bid lower in SSA auctions as a result of the bias, which always leads to lower search-engine revenue. We also find that an advertiser can always increase his payoff when he unilaterally switches to complete data from aggregate data. Finally, we empirically quantify the losses experienced by the search engine and the advertisers and find that the search engine loses over 17% of its revenue on average. We also observe that an advertiser loses around 6% of his payoffs due to data aggregation. Our findings raise serious concerns for SSA practitioners and also question the adequacy of the data standards that have become common in SSA. Finally, we provide recommendations for aggregate datasets that do not suffer from the bias.

Related Organizations

Carnegie Mellon University
United States
University of Pennsylvania
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%