- Tytuł:
- The number of clusters in hybrid predictive models: does it really matter?
- Autorzy:
-
Łapczyński, Mariusz
Jefmański, Bartłomiej - Powiązania:
- https://bibliotekanauki.pl/articles/1046637.pdf
- Data publikacji:
- 2020
- Wydawca:
- Główny Urząd Statystyczny
- Tematy:
-
hybrid predictive model
k-means algorithm
decision trees - Opis:
- For quite a long time, research studies have attempted to combine various analytical tools to build predictive models. It is possible to combine tools of the same type (ensemble models, committees) or tools of different types (hybrid models). Hybrid models are used in such areas as customer relationship management (CRM), web usage mining, medical sciences, petroleum geology and anomaly detection in computer networks. Our hybrid model was created as a sequential combination of a cluster analysis and decision trees. In the first step of the procedure, objects were grouped into clusters using the k-means algorithm. The second step involved building a decision tree model with a new independent variable that indicated which cluster the objects belonged to. The analysis was based on 14 data sets collected from publicly accessible repositories. The performance of the models was assessed with the use of measures derived from the confusion matrix, including the accuracy, precision, recall, F-measure, and the lift in the first and second decile. We tried to find a relationship between the number of clusters and the quality of hybrid predictive models. According to our knowledge, similar studies have not been conducted yet. Our research demonstrates that in some cases building hybrid models can improve the performance of predictive models. It turned out that the models with the highest performance measures require building a relatively large number of clusters (from 9 to 15).
- Źródło:
-
Przegląd Statystyczny; 2019, 66, 3; 228-238
0033-2372 - Pojawia się w:
- Przegląd Statystyczny
- Dostawca treści:
- Biblioteka Nauki