Divide-and-Conquer Information-Based Optimal  Subdata Selection Algorithm

Name: Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm
Creator: HaiYing Wang
Keywords: D-optimality, FOS: Computer and information sciences, Linear regression; mixed models, Computational problems in statistics, Mathematics - Statistics Theory, information-based optimal subdata selection (IBOSS), Statistics Theory (math.ST), Statistical aspects of information-theoretic topics, Statistics - Computation, 01 natural sciences

HaiYing Wang

Found an issue? Give us feedback

Journal of Statistic...arrow_drop_down

Journal of Statistical Theory and Practice

Article

Data sources: UnpayWall

arXiv.org e-Print Archive

Preprint . 2019

Data sources: arXiv.org e-Print Archive

Journal of Statistical Theory and Practice

Article . 2019 . Peer-reviewed

License: Springer TDM

Data sources: Crossref

zbMATH Open

Article . 2019

Data sources: zbMATH Open

https://dx.doi.org/10.48550/ar...

Article . 2019

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

https://dx.doi.org/10.1007/s42...

Other literature type

Data sources: Microsoft Academic Graph

https://dx.doi.org/10.1007/s42...

Article

Data sources: Microsoft Academic Graph

Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

Divide-and-conquer information-based optimal subdata selection algorithm

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jul 2019Embargo end date: 01 Jan 2019 English Publisher:Springer Science and Business Media LLCJournal:Journal of Statistical Theory and Practice, volume 13 (issn: 1559-8608, eissn: 1559-8616,

Copyright policy )Funded by:NSF | Collaborative Research: I...

Authors: HaiYing Wang;

doi: 10.1007/s42519-019-0048-5 , 10.48550/arxiv.1905.09948

arXiv: 1905.09948

Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

- Summary
- Subjects
- Related research
  (10)
- External Databases
  (1)
- Metrics

Abstract

The information-based optimal subdata selection (IBOSS) is a computationally efficient method to select informative data points from large data sets through processing full data by columns. However, when the volume of a data set is too large to be processed in the available memory of a machine, it is infeasible to implement the IBOSS procedure. This paper develops a divide-and-conquer IBOSS approach to solving this problem, in which the full data set is divided into smaller partitions to be loaded into the memory and then subsets of data are selected from each partitions using the IBOSS algorithm. We derive both finite sample properties and asymptotic properties of the resulting estimator. Asymptotic results show that if the full data set is partitioned randomly and the number of partitions is not very large, then the resultant estimator has the same estimation efficiency as the original IBOSS estimator. We also carry out numerical experiments to evaluate the empirical performance of the proposed method.

21 pages, 3 figures, 1 table

Related Organizations

University of Connecticut
United States

Keywords

D-optimality, FOS: Computer and information sciences, Linear regression; mixed models, Computational problems in statistics, Mathematics - Statistics Theory, information-based optimal subdata selection (IBOSS), Statistics Theory (math.ST), Statistical aspects of information-theoretic topics, Statistics - Computation, Methodology (stat.ME), information matrix, big data, linear regression, FOS: Mathematics, subdata, Statistics - Methodology, Computation (stat.CO)

10 Research products, page 1 of 1

INPPS Flagship with iBOSS Building Blocks
2019IsAmongTopNSimilarDocuments
Information-Based Optimal Subdata Selection for Big Data Linear Regression
2018IsAmongTopNSimilarDocuments
Feasibility Study for the Use of Compliant Structures in Insert Elements to Allow for an Isostatic Mounting of Components
2017IsAmongTopNSimilarDocuments
On Data Reduction of Big Data
2018IsAmongTopNSimilarDocuments
On Data Reduction of Big Data
2018IsAmongTopNSimilarDocuments
iFishIENCi Biology online and integration in feeding monitoring systems
2020IsAmongTopNSimilarDocuments
Gateway to securing the cloud
2019IsAmongTopNSimilarDocuments
Information-Based Optimal Subdata Selection for Big Data Linear Regression
2018IsAmongTopNSimilarDocuments
Information-based Optimal Subdata Selection for Clusterwise Linear Regression Model
2022IsAmongTopNSimilarDocuments
Entwurf und experimentelle Untersuchunngen einer Treibstofftransferschnittstelle für Raumfahrzeuge und ihre Anwendungsmöglichkeiten für das On-Orbit-Servicing
2022IsAmongTopNSimilarDocuments

1dbs

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	20
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

20

Top 10%

Green

bronze

Fields of Science

Fields of Science

Funded by

NSF| Collaborative Research: Information-Based Subdata Selection Inspired by Optimal Design of Experiments