Name: Error Bounds for the Network Scale-Up Method
Keywords: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Discrete Mathematics (cs.DM), Discrete Mathematics, Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Social and Information Networks

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 03 Aug 2025Embargo end date: 01 Jan 2024Publisher:ACMJournal:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2

Authors: Sergio Díaz-Aranda; Juan Marcos Ramirez; Mohit Daga; Jaya Prakash Champati; Jose Aguilar; Rosa Lillo; Antonio Fernández Anta;

doi: 10.1145/3711896.3736940 , 10.48550/arxiv.2407.10640

arXiv: 2407.10640

Error Bounds for the Network Scale-Up Method

- Summary
- Subjects
- Metrics

Abstract

Epidemiologists and social scientists have used the Network Scale-Up Method (NSUM) for over thirty years to estimate the size of a hidden sub-population within a social network. This method involves querying a subset of network nodes about the number of their neighbours belonging to the hidden sub-population. In general, NSUM assumes that the social network topology and the hidden sub-population distribution are well-behaved; hence, the NSUM estimate is close to the actual value. However, bounds on NSUM estimation errors have not been analytically proven. This paper provides analytical bounds on the error incurred by the two most popular NSUM estimators. These bounds assume that the queried nodes accurately provide their degree and the number of neighbors belonging to the hidden population. Our key findings are twofold. First, we show that when an adversary designs the network and places the hidden sub-population, then the estimate can be a factor of $Ω(\sqrt{n})$ off from the real value (in a network with $n$ nodes). Second, we also prove error bounds when the underlying network is randomly generated, showing that a small constant factor can be achieved with high probability using samples of logarithmic size $O(\log{n})$. We present improved analytical bounds for Erdos-Renyi and Scale-Free networks. Our theoretical analysis is supported by an extensive set of numerical experiments designed to determine the effect of the sample size on the accuracy of the estimates in both synthetic and real networks.

Full version of the KDD 2025 paper

Related Organizations

View all View all

Keywords

Social and Information Networks (cs.SI), FOS: Computer and information sciences, Discrete Mathematics (cs.DM), Discrete Mathematics, Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Social and Information Networks

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

UArctic