Semi-supervised approach to handle sudden concept drift in Enron data

Szczegóły
Opis

Tytuł:: Semi-supervised approach to handle sudden concept drift in Enron data
Autorzy:: Kmieciak, M. R.
Stefanowski, J.
Powiązania:: https://bibliotekanauki.pl/articles/206052.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: concept drift
incremental learning of classifiers
email foldering
Enron data
Źródło:: Control and Cybernetics; 2011, 40, 3; 667-695
0324-8569
Język:: angielski
Prawa:: Wszystkie prawa zastrzeżone. Swoboda użytkownika ograniczona do ustawowego zakresu dozwolonego użytku
Dostawca treści:: Biblioteka Nauki
: Artykuł

Przejdź do źródła

Detection of concept changes in incremental learning from data streams and classifier adaptation is studied in this paper. It is often assumed that all processed learning examples are always labeled, i.e. the class label is available for each example. As it may be difficult to satisfy this assumption in practice, in particular in case of data streams, we introduce an approach that detects concept drift in unlabeled data and retrains the classifier using a limited number of additionally labeled examples. The usefulness of this partly supervised approach is evaluated in the experimental study with the Enron data. This real life data set concerns classification of user's emails to multiple folders. Firstly, we show that the Enron data are characterized by frequent sudden changes of concepts. We also demonstrate that our approach can precisely detect these changes. Results of the next comparative study demonstrate that our approach leads to the classification accuracy comparable to two fully supervised methods: the periodic retraining of the classifier based on windowing and the trigger approach with the DDM supervised drift detection. However, our approach reduces the number of examples to be labeled. Furthermore, it requires less updates of retraining classifiers than windowing.

Informacja

Powiązane pozycje