Comparison of missing value imputation methods in time series: the case of Turkish meteorological data


YOZGATLIGİL C., Aslan S. , İYİGÜN C., BATMAZ İ.

THEORETICAL AND APPLIED CLIMATOLOGY, cilt.112, ss.143-167, 2013 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 112
  • Basım Tarihi: 2013
  • Doi Numarası: 10.1007/s00704-012-0723-x
  • Dergi Adı: THEORETICAL AND APPLIED CLIMATOLOGY
  • Sayfa Sayıları: ss.143-167

Özet

This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.