- Tytuł:
- The prediction of new Covid-19 cases in Poland with machine learning models
- Autorzy:
- Chwila, Adam
- Powiązania:
- https://bibliotekanauki.pl/articles/14764465.pdf
- Data publikacji:
- 2023-03-15
- Wydawca:
- Główny Urząd Statystyczny
- Tematy:
-
machine learning
time series
COVID-19
forecasting
economic activity - Opis:
- The COVID-19 pandemic has had a huge impact both on the global economy and on everyday life in all countries all over the world. In this paper, we propose several possible machine learning approaches to forecasting new confirmed COVID-19 cases, including the LASSO regression, Gradient Boosted (GB) regression trees, Support Vector Regression (SVR), and Long-Short Term Memory (LSTM) neural network. The above methods are applied in two variants: to the data prepared for the whole Poland and to the data prepared separately for each of the 16 voivodeships (NUTS 2 regions). The learning of all the models has been performed in two variants: with the 5-fold time-series cross-validation as well as with the split into the single train and test subsets. The computations in the study used official statistics from government reports from the period of April 2020 to March 2022. We propose a setup of 16 scenarios of the model selection to detect the model characterized by the best ex-post prediction accuracy. The scenarios differ from each other by the following features: the machine learning model, the method for the hyperparameters selection and the data setup. The most accurate scenario for the LASSO and SVR machine learning approaches is the single train/test dataset split with data for the whole Poland, while in case of the LSTM and GB trees it is the cross validation with data for whole Poland. Among the best scenarios for each model, the most accurate ex-post RMSE is obtained for the SVR. For the model performing best in terms of the ex-post RMSE, the interpretation of the outcome is conducted with the Shapley values. The Shapley values make it possible to present the impact of auxiliary variables in the machine learning model on the actual predicted value. The knowledge regarding factors that have the strongest impact on the number of new infections can help companies to plan their economic activity during turbulent times of pandemics. We propose to identify and compare the most important variables that affect both the train and test datasets of the model.
- Źródło:
-
Statistics in Transition new series; 2023, 24, 2; 59-83
1234-7655 - Pojawia się w:
- Statistics in Transition new series
- Dostawca treści:
- Biblioteka Nauki