Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Journal of Biomedica...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Journal of Biomedical Informatics
Article . 2022 . Peer-reviewed
License: Elsevier Non-Commercial
Data sources: Crossref
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Distributed Quasi-Poisson regression algorithm for modeling multi-site count outcomes in distributed data networks

Authors: Mackenzie J, Edmondson; Chongliang, Luo; Md, Nazmul Islam; Natalie E, Sheils; John, Buresh; Zhaoyi, Chen; Jiang, Bian; +1 Authors

Distributed Quasi-Poisson regression algorithm for modeling multi-site count outcomes in distributed data networks

Abstract

Observational studies incorporating real-world data from multiple institutions facilitate study of rare outcomes or exposures and improve generalizability of results. Due to privacy concerns surrounding patient-level data sharing across institutions, methods for performing regression analyses distributively are desirable. Meta-analysis of institution-specific estimates is commonly used, but has been shown to produce biased estimates in certain settings. While distributed regression methods are increasingly available, methods for analyzing count outcomes are currently limited. Count data in practice are commonly subject to overdispersion, exhibiting greater variability than expected under a given statistical model.We propose a novel computational method, a one-shot distributed algorithm for quasi-Poisson regression (ODAP), to distributively model count outcomes while accounting for overdispersion.ODAP incorporates a surrogate likelihood approach to perform distributed quasi-Poisson regression without requiring patient-level data sharing, only requiring sharing of aggregate data from each participating institution. ODAP requires at most three rounds of non-iterative communication among institutions to generate coefficient estimates and corresponding standard errors. In simulations, we evaluate ODAP under several data scenarios possible in multi-site analyses, comparing ODAP and meta-analysis estimates in terms of error relative to pooled regression estimates, considered the gold standard. In a proof-of-concept real-world data analysis, we similarly compare ODAP and meta-analysis in terms of relative error to pooled estimatation using data from the OneFlorida Clinical Research Consortium, modeling length of stay in COVID-19 patients as a function of various patient characteristics. In a second proof-of-concept analysis, using the same outcome and covariates, we incorporate data from the UnitedHealth Group Clinical Discovery Database together with the OneFlorida data in a distributed analysis to compare estimates produced by ODAP and meta-analysis.In simulations, ODAP exhibited negligible error relative to pooled regression estimates across all settings explored. Meta-analysis estimates, while largely unbiased, were increasingly variable as heterogeneity in the outcome increased across institutions. When baseline expected count was 0.2, relative error for meta-analysis was above 5% in 25% of iterations (250/1000), while the largest relative error for ODAP in any iteration was 3.59%. In our proof-of-concept analysis using only OneFlorida data, ODAP estimates were closer to pooled regression estimates than those produced by meta-analysis for all 15 covariates. In our distributed analysis incorporating data from both OneFlorida and the UnitedHealth Group Clinical Discovery Database, ODAP and meta-analysis estimates were largely similar, while some differences in estimates (as large as 13.8%) could be indicative of bias in meta-analytic estimates.ODAP performs privacy-preserving, communication-efficient distributed quasi-Poisson regression to analyze count outcomes using data stored within multiple institutions. Our method produces estimates nearly matching pooled regression estimates and sometimes more accurate than meta-analysis estimates, most notably in settings with relatively low counts and high outcome heterogeneity across institutions.

Related Organizations
Keywords

Likelihood Functions, Models, Statistical, COVID-19, Humans, Regression Analysis, Algorithms

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    19
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
19
Top 10%
Top 10%
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!