Text classification using word sequences

Szczegóły
Opis

Tytuł:: Text classification using word sequences
Autorzy:: Chudzian, P.
Powiązania:: https://bibliotekanauki.pl/articles/92904.pdf
Data publikacji:: 2008
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: text classification
text representation
generalized suffix tree
Źródło:: Studia Informatica : systems and information technology; 2008, 1(10); 75-85
1731-2264
Język:: angielski
Prawa:: Wszystkie prawa zastrzeżone. Swoboda użytkownika ograniczona do ustawowego zakresu dozwolonego użytku
Dostawca treści:: Biblioteka Nauki
: Artykuł

Przejdź do źródła

The article discusses the use of word sequences in text classification. As opposed to ngrams, word sequences are not of a fixed length and therefore allow the classifier to obtain flexibility necessary to operate on documents collected from various sources. Presented classifier is built upon the suffix tree structure which enables word sequences to take part in classification process. During classification, both single words and longer sequences are taken into account and have impact on the category assignment with respect to their frequency and length. The Suffix Tree Classifier and well known Naive Bayes Classifier are compared and their properties are discussed. Obtained results show that incorporating word sequences into text classification can increase accuracy and reveal some interesting relations between maximal length of used sequences and classifier's error rate.

Informacja

Powiązane pozycje