Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao https://doi.org/10.1...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
SSRN Electronic Journal
Article . 2011 . Peer-reviewed
Data sources: Crossref
DBLP
Conference object . 2018
Data sources: DBLP
versions View all 3 versions
addClaim

On Aggregation Bias in Sponsored Search Data: Existence and Implications

existence andimplications
Authors: Vibhanshu Abhishek; Kartik Hosanagar; Peter S. Fader;

On Aggregation Bias in Sponsored Search Data: Existence and Implications

Abstract

There has been significant recent interest in studying consumer behavior in sponsored search advertising (SSA). Researchers have typically used daily data from search engines containing measures such as average bid, average ad position, total impressions, clicks and cost for each keyword in the advertiser's campaign. A variety of random utility models have been estimated using such data and the results have helped researchers explore the factors that drive consumer click and conversion propensities. However, virtually every analysis of this kind has ignored the intra-day variation in ad position. We show that estimating random utility models on aggregated (daily) data without accounting for this variation will lead to systematically biased estimates -- specifically, the impact of ad position on click-through rate (CTR) is attenuated and the predicted CTR is higher than the actual CTR. First, we prove that the average daily position of an ad is less in convex order than the actual position of the ad for an impression. Using this result, we analytically demonstrate the existence of the aggregation bias. Second, using a large disaggregate dataset from a major search engine containing 8 million impressions, we empirically validate our findings for both the traditional logit model and the Hierarchical Bayesian models that are commonly used in the SSA literature. Third, we build a game-theoretic model to analyze the effect of the bias on the equilibrium of the SSA auction.We find that advertisers bid lower in SSA auctions as a result of the bias, which always leads to lower search-engine revenue. We also find that an advertiser can always increase his payoff when he unilaterally switches to complete data from aggregate data. Finally, we empirically quantify the losses experienced by the search engine and the advertisers and find that the search engine loses over 17% of its revenue on average. We also observe that an advertiser loses around 6% of his payoffs due to data aggregation. Our findings raise serious concerns for SSA practitioners and also question the adequacy of the data standards that have become common in SSA. Finally, we provide recommendations for aggregate datasets that do not suffer from the bias.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    7
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
7
Average
Average
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!