- Tytuł:
- Semi-supervised approach to handle sudden concept drift in Enron data
- Autorzy:
-
Kmieciak, M. R.
Stefanowski, J. - Powiązania:
- https://bibliotekanauki.pl/articles/206052.pdf
- Data publikacji:
- 2011
- Wydawca:
- Polska Akademia Nauk. Instytut Badań Systemowych PAN
- Tematy:
-
concept drift
incremental learning of classifiers
email foldering
Enron data - Opis:
- Detection of concept changes in incremental learning from data streams and classifier adaptation is studied in this paper. It is often assumed that all processed learning examples are always labeled, i.e. the class label is available for each example. As it may be difficult to satisfy this assumption in practice, in particular in case of data streams, we introduce an approach that detects concept drift in unlabeled data and retrains the classifier using a limited number of additionally labeled examples. The usefulness of this partly supervised approach is evaluated in the experimental study with the Enron data. This real life data set concerns classification of user's emails to multiple folders. Firstly, we show that the Enron data are characterized by frequent sudden changes of concepts. We also demonstrate that our approach can precisely detect these changes. Results of the next comparative study demonstrate that our approach leads to the classification accuracy comparable to two fully supervised methods: the periodic retraining of the classifier based on windowing and the trigger approach with the DDM supervised drift detection. However, our approach reduces the number of examples to be labeled. Furthermore, it requires less updates of retraining classifiers than windowing.
- Źródło:
-
Control and Cybernetics; 2011, 40, 3; 667-695
0324-8569 - Pojawia się w:
- Control and Cybernetics
- Dostawca treści:
- Biblioteka Nauki