Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Columbia University ...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://dx.doi.org/10.7916/d8-...
Other literature type . 2019
Data sources: Datacite
versions View all 1 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Quantile regression for zero-inflated outcomes

Authors: Ling, Wodan;

Quantile regression for zero-inflated outcomes

Abstract

Zero-inflated outcomes are common in biomedical studies, where the excessive zeros indicate some special but undetectable events. Quantile regression is potentially advantageous in analyzing zero-inflated outcomes due to two reasons. First, compared to parametric models such as the zero-inflated Poisson and two-part model, quantile regression gives robust and accurate estimation by avoiding likelihood specification and can capture the tail events and heterogeneity over the outcome distribution. Second, while the mean-based regression may be misinterpreted for a zero-inflated outcome, the interpretation of quantiles is naturally compatible with the underlying process that such an outcome intends to measure. Unfortunately, uncorrected linear quantile regression is not directly applicable because of two reasons. First, the feasibility of estimation and validity of inference of quantile regression require the conditional distribution of outcomes to be absolutely continuous, which is violated due to zero-inflation. Second, direct quantile regression implicitly assumes a constant chance to observe a positive outcome, but the degree of zero-inflation varies with the covariates in most cases. Thus the conditional quantile function of the outcome depends on the covariates in a nonlinear fashion. To analyze the zero-inflated outcomes by taking advantage of the merits of quantile regression, we propose a novel quantile regression framework that can address all the issues above. In the first part of this dissertation, we propose a two-part model that comprises a logistic regression for the probability of being positive, and a linear quantile regression for the positive part with subject-specific zero-inflation adjusted. Inference on the estimated conditional quantile and covariate effect are not trivial based on such a two-part model. We then develop an algorithm to achieve a consistent estimation of the conditional quantiles, while circumventing the unbounded variance at the quantile level where the conditional quantile changes from zero to positive. Furthermore, we develop an inference tool to determine the quantile treatment effect associated with a covariate at a given quantile level. We evaluate the proposed method and compare it with existing approaches by simulation studies and a real data analysis aimed at studying the risk factors for carotid atherosclerosis. In the second part, based on the proposed two-part model mentioned above, we develop ZIQRank, a zero-inflated quantile rank-score based test to detect the difference in distributions. The proposed test extends the local inference in the first part to a simultaneous one. It is powerful to handle zero-inflation and heterogeneity simultaneously. It comprises a valid test of logistic regression for the zero-inflation and rank-score based tests on multiple quantiles for the positive part with zero-inflation adjusted. The p-values are combined with a procedure selected according to the extent of zero-inflation and heterogeneity of the data. Simulation studies show that compared to existing tests, the proposed test has a higher power in detecting differential distributions. Finally, we apply the ZIQRank test to a human scRNA-seq data to study differentially expressed genes in Neoplastic and Regular cells. It successfully discovers a group of crucial genes associated with glioma, while the other methods fail to do so. In the third part, we extend the proposed two-part quantile regression model for zero-inflated outcomes and the ZIQRank test to analyze longitudinal data. Each part of the proposed two-part model is modified as a marginal longitudinal model (GEE), conditioning on the outcome at the previous time point and its zero/positive status. We apply the model and the test to study the effect of a recommender system aimed at boosting user engagement of a suite of smartphone apps designed for depressed patients. Our novel model framework demonstrates a dominating performance in model fitting, prediction, and critical feature detection, compared to the existing methods.

Country
United States
Keywords

Mathematical models, Biometry, 330, Quantile regression, Distribution (Probability theory), 310

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities