The Performance Comparison of Multiple Linear Regression, Random Forest and Artificial Neural Network by using Photovoltaic and Atmospheric Data

Kayri M., Kayri I., Gencoglu M. T.

14th International Conference on Engineering of Modern Electric Systems (EMES), Oradea, Romania, 1 - 02 June 2017, pp.1-4 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/emes.2017.7980368
  • City: Oradea
  • Country: Romania
  • Page Numbers: pp.1-4
  • Keywords: artificial neural network, data mining, linear regression, photovoltaic module, random forest
  • Van Yüzüncü Yıl University Affiliated: Yes


In this study, the estimation performances of Multiple Linear Regression, Random Forest, and Artificial Neural Network are examined comparatively. For comparison of these data mining techniques, the power production data from a Photovoltaic Module was used in the research. In this study, the model was constituted from seven variables. One of the variables is dependent (power) and the others are independent variables (global radiation, temperature, wind speed, wind direction, relative humidity, solar elevation angle). In this paper, the Mean Absolute Error and the correlation coefficient were used in order to compare the estimation performance of the mentioned data mining techniques. While the correlation coefficient is 0.963 in Multiple Linear Regression model, the correlation coefficient is 0.986 in Random Forest decision tree method. The highest correlation coefficient was obtained in Artificial Neural Network architecture (R = 0.997). According to the three data mining methods, the global radiation was found as the most important predictor. While the least important predictor is the wind direction in both the Artificial Neural Network and the Random Forest models, the solar elevation angle is the least important predictor in the Multiple Linear Regression model.