Analyzing and Forecasting of Coronavirus Time-Series Data: Performance Comparison of Machine Learning and Statistical Models

Document Type : Research Paper

Authors

1 Assistant Professor, Department of industrial engineering, faculty of engineering, Ardakan University, Ardakan, Iran.

2 Ph.D. Candidate, Industrial Engineering Department, Faculty of Engineering, Yazd University, Yazd, Iran.

3 Associate Professor, Industrial Engineering Department, Faculty of Engineering, Yazd University, Yazd, Iran.

Abstract

Coronavirus is a respiratory disease caused by coronavirus 2 acute respiratory syndrome. Forecasting the number of new cases and deaths can be an efficient step towards predicting costs and providing timely and sufficient facilities needed in the future. The goal of the current study is to accurately formulate and predict new cases and mortality in the future. Nine prediction models are tested on the Coronavirus data of Yazd province as a case study. Due to the evaluation criteria of root mean square error (RMSE), mean square error (MSE), mean absolute percentage error (MAPE), and mean absolute value of error (MAE), the models are compared. The analysis results emphasize that, according to the mentioned evaluation criteria, the KNN regression model and the BATS model are the best models for predicting the cumulative cases of hospitalization of Coronavirus and the cumulative cases of death, respectively. Moreover, for both hospitalization and death cases, the autoregressive neural network model has the worst performance among other formulations.

Keywords

Main Subjects


[1] "Q&A on coronaviruses (COVID-19)". World Health Organization. Retrieved 11 March 2020.
[2] Tran, T.T., Pham, L.T. and Ngo, Q.X., 2020. Forecasting epidemic spread of SARS-CoV-2 using ARIMA model (Case study: Iran). Global Journal of Environmental Science and Management, 6(Special Issue (Covid-19)), pp.1-10.
[3] Zhang, X., Liu, Y., Yang, M., Zhang, T., Young, A.A. and Li, X., 2013. Comparative study of four time series methods in forecasting typhoid fever incidence in China. PloS one, 8(5), p.e63116.
[4] Chen, Y., Leng, K., Lu, Y., Wen, L., Qi, Y., Gao, W., Chen, H., Bai, L., An, X., Sun, B. and Wang, P., 2020. Epidemiological features and time-series analysis of influenza incidence in urban and rural areas of Shenyang, China, 2010–2018. Epidemiology & Infection, 148.
[5] Ceylan, Z., 2020. Estimation of COVID-19 prevalence in Italy, Spain, and France. Science of The Total Environment, 729, p.138817.
[6] Bayyurt, L. and Bayyurt, B., 2020. Forecasting of COVID-19 cases and deaths using ARIMA models. medrxiv, pp.2020-04.
[7] Tandon, H., Ranjan, P., Chakraborty, T. and Suhag, V., 2022. Coronavirus (COVID-19): ARIMA-based Time-series Analysis to Forecast near Future and the Effect of School Reopening in India. Journal of Health Management, 24(3), pp.373-388.
[8] Perone, G., 2020. An ARIMA model to forecast the spread and the final size of COVID-2019 epidemic in Italy. MedRxiv, pp.2020-04.
[9] Ghosal, S., Sengupta, S., Majumder, M. and Sinha, B., 2020. Linear Regression Analysis to predict the number of deaths in India due to SARS-CoV-2 at 6 weeks from day 0 (100 cases-March 14th 2020). Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(4), pp.311-315.
[10] Parbat, D. and Chakraborty, M., 2020. A python based support vector regression model for prediction of COVID19 cases in India. Chaos, Solitons & Fractals, 138, p.109942.
[11] Naimoli, A., 2022. Modelling the persistence of Covid-19 positivity rate in Italy. Socio-Economic Planning Sciences, 82, p.101225.
[12] Kibria, H.B., Jyoti, O. and Matin, A., 2022. Forecasting the spread of the third wave of COVID-19 pandemic using time series analysis in Bangladesh. Informatics in medicine unlocked, 28, p.100815..
[13] Khan, F.M. and Gupta, R., 2020. ARIMA and NAR based prediction model for time series analysis of COVID-19 cases in India. Journal of Safety Science and Resilience, 1(1), pp.12-18.
[14] Chowdhury, A.A., Hasan, K.T. and Hoque, K.K.S., 2021. Analysis and prediction of COVID-19 pandemic in Bangladesh by using ANFIS and LSTM network. Cognitive Computation, 13, pp.761-770.
[15] Alassafi, M.O., Jarrah, M. and Alotaibi, R., 2022. Time series predicting of COVID-19 based on deep learning. Neurocomputing, 468, pp.335-344.
[16] Chyon, F.A., Suman, M.N.H., Fahim, M.R.I. and Ahmmed, M.S., 2022. Time series analysis and predicting COVID-19 affected patients by ARIMA model using machine learning. Journal of Virological Methods, 301, p.114433.
[17] Nair, S., Ckm, G., Varsha, R., Ghosal, S., Vergin, M. and Anbarasi, L.J., 2022. Intelligent Forecasting Strategy for COVID-19 Pandemic Trend in India: A Statistical Approach. In Artificial Intelligence and Technologies: Select Proceedings of ICRTAC-AIT 2020 (pp. 553-560). Springer Singapore.
[18] Mukhairez, H H, & Alaff, A J, 2022. Short-term Forecasting of COVID-19. In Computational Intelligence for COVID-19 and Future Pandemics, (pp. 257-266). Springer, Singapore.
[19] Prajapati, S., Swaraj, A., Lalwani, R., Narwal, A. and Verma, K., 2021. Comparison of traditional and hybrid time series models for forecasting COVID-19 cases. arXiv preprint arXiv:2105.03266.
[20] Li, C., Sampene, A.K., Agyeman, F.O., Robert, B. and Ayisi, A.L., 2022. Forecasting the severity of COVID-19 pandemic amidst the emerging SARS-CoV-2 variants: adoption of ARIMA model. Computational and Mathematical Methods in Medicine, 2022.
[21] Chung, H., Ko, H., Lee, H., Yon, D.K., Lee, W.H., Kim, T.S., Kim, K.W. and Lee, J., 2023. Development and validation of a deep learning model to diagnose COVID‐19 using time‐series heart rate values before the onset of symptoms. Journal of Medical Virology.
[22] Yaffee RA, McGee M. 2000, An introduction to time series analysis and forecasting: with applications of SAS® and SPSS®. Elsevier.
[23] Sardar, I., Akbar, M.A., Leiva, V., Alsanad, A. and Mishra, P., 2023. Machine learning and automatic ARIMA/Prophet models-based forecasting of COVID-19: Methodology, evaluation, and case study in SAARC countries. Stochastic Environmental Research and Risk Assessment, 37(1), pp.345-359.
[24] Kufel, T., 2020. ARIMA-based forecasting of the dynamics of confirmed Covid-19 cases for selected European countries. Equilibrium. Quarterly Journal of Economics and Economic Policy, 15(2), pp.181-204.
[25] Assimakopoulos, V. and Nikolopoulos, K., 2000. The theta model: a decomposition approach to forecasting. International journal of forecasting, 16(4), pp.521-530.
[26] Cleveland RB, Cleveland WS, McRae JE, Terpenning I. 1990. STL: A seasonal-trend decomposition. J. Off. Stat. 6(1):3-73.
[27] Hyndman, R.J., King, M.L., Pitrun, I. and Billah, B., 2005. Local linear forecasts using cubic smoothing splines. Australian & New Zealand Journal of Statistics, 47(1), pp.87-99.
[28] De Livera, A.M., Hyndman, R.J. and Snyder, R.D., 2011. Forecasting time series with complex seasonal patterns using exponential smoothing. Journal of the American statistical association, 106(496), pp.1513-1527.