R package imputeTestbench to compare imputations methods for univariate time series

Preprint English OPEN
Bokde, Neeraj ; Kulat, Kishore ; Beck, Marcus W ; Asencio-Cortés, Gualberto (2016)
  • Subject: Statistics - Methodology | Computer Science - Mathematical Software

This paper describes the R package imputeTestbench that provides a testbench for comparing imputation methods for missing data in univariate time series. The imputeTestbench package can be used to simulate the amount and type of missing data in a complete dataset and compare filled data using different imputation methods. The user has the option to simulate missing data by removing observations completely at random or in blocks of different sizes. Several default imputation methods are included with the package, including historical means, linear interpolation, and last observation carried forward. The testbench is not limited to the default functions and users can add or remove additional methods using a simple two-step process. The testbench compares the actual missing and imputed data for each method with different error metrics, including RMSE, MAE, and MAPE. Alternative error metrics can also be supplied by the user. The simplicity of use and significant reduction in time to compare imputation methods for missing data in univariate time series is a significant advantage of the package. This paper provides an overview of the core functions, including a demonstration with examples.
  • References (25)
    25 references, page 1 of 3

    N. Bokde and M. W. Beck. imputeTestbench: Test Bench for Missing Data Imputing Models/Methods Comparison, 2016. URL https://cran.r-project.org/package=imputeTestbench. R package version 3.0.0. [p3, 13]

    P. J. Brockwell and R. A. Davis. Introduction to Time Series and Forecasting. Springer-Verlag New York, 1996. ISBN 978-1-4757-2526-1. [p11]

    S. Buuren and K. Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45(3), 2011. [p1]

    D. Eddelbuettel and R. François. Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8):1-18, 2011. URL http://www.jstatsoft.org/v40/i08/. [p13]

    J. Honaker, G. King, M. Blackwell, et al. Amelia ii: A program for missing data. [p1]

    R. J. Hyndman. forecast: Forecasting functions for time series and linear models, 2016. URL http://github. com/robjhyndman/forecast. R package version 7.2. [p3]

    R. Jörnsten, M. Ouyang, and H.-Y. Wang. A meta-data based method for dna microarray imputation. BMC bioinformatics, 8(1):109, 2007. [p1]

    D. Li, J. Deogun, W. Spaulding, and B. Shuart. Towards missing data imputation: a study of fuzzy k-means clustering method. In Rough sets and current trends in computing, pages 573-579. Springer, 2004. [p1]

    H. Li, C. Zhao, F. Shao, G.-Z. Li, and X. Wang. A hybrid imputation approach for microarray missing value estimation. BMC genomics, 16(Suppl 9):S1, 2015. [p1]

    S. Moritz. imputeTS: Time Series Missing Value Imputation, 2015. URL https://CRAN.R-project.org/ package=imputeTS. R package version 0.4. [p3]

  • Metrics
    No metrics available
Share - Bookmark