
doi: 10.18130/v3h70806k
With the development of science and modern technology, more and more data are being collected continuously over a time interval in various disciplines, such as public health, biology, medicine and finance. Such data can be viewed as ``functional data". Functional data analysis (FDA), which deals with the analysis and theory of functional data, has been receiving increasing popularity over the past decades. In this dissertation, we propose several functional data analysis methods and apply them to NIH cohort study, which is a study in the field of growth modeling. It is well known that early year catch-down growth is highly prevalent in developing countries for the reason of malnutrition (Black et al. [2008]). Children who suffers from malnutrition in the first 5 years of life will be at increasing risk for the development in cognitive and physical growth. Therefore, characterizing the catch-down growth and identifying the associate important risk factors is one of the most popular topics. In our study, we aim to investigate the relationship between height-for-age Z score (HAZ) at year 3 and a collection of predictors. However, we meet two problems. First, all functional predictors are sparsely and irregularly observed, that is, the measurement time varies from individual to individual. Functional predictors over the entire time interval must be estimated in order to perform the regression. In addition, some predictors, such as height, should be monotone over time, and a non-monotone estimation of height would make no sense. Secondly, the relationship between the response and functional predictors is not usually linear. Furthermore, here exists outliers in the response. To address the first problem, we propose a new method based on a monotone transformation, functional principal component (FPC) analysis and a penalized regression to estimate monotone functions for sparse growth data. We also prove the asymptotic properties for this proposed estimator. Extensive numerical studies show that our proposed method outperforms the existing methods in terms of model fitting and monotonicity of the estimation. In addition, the proposed method can also be utilized as a data preprocessing procedure for other methods, such as functional clustering and classification, where the functional predictors are required to be completely known. To address the second problem, we build a functional single index model for the non-linear relationship between response and functional predictors. The functional single index model is not only flexible but also interpretable. To deal with outliers, we propose a local modal regression (LMR) (Yao et al. [2012]) based estimation method. We show that by using the optimal bandwidth, the LMR estimator is not only robust when there are outliers or the error distribution is heavy tailed, but also asymptotically as efficient as the ordinary least squares based estimator when the error distribution is a Gaussian distribution. In addition, we conduct extensive simulation studies to demonstrate the robustness and efficiency of the resulting estimator by comparing it with least squares estimator and Huber estimator across different error distributions.
local modal regression, monotone function estimation, penalized regression
local modal regression, monotone function estimation, penalized regression
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
