Temat: topic - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Building semantic user profile for polish web news portal
Autorzy:: Misztal-Radecka, J.
Powiązania:: https://bibliotekanauki.pl/articles/305619.pdf
Data publikacji:: 2018
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: user profiling
word embeddings
topic modeling
natural language processing
gender prediction
Opis:: The aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task –Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.
Źródło:: Computer Science; 2018, 19 (3); 307--332
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Impact of n-stage latent Dirichlet allocation on analysis of headline classification
Autorzy:: Guven, Zekeriya Anil
Diri, Banu
Cakaloglu, Tolgahan
Powiązania:: https://bibliotekanauki.pl/articles/27312901.pdf
Data publikacji:: 2022
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: topic modeling
headline classification
machine learning
text classification
latent Dirichlet allocation
data analysis
Opis:: Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help overcome this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we benchmark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classification. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature extractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.
Źródło:: Computer Science; 2022, 23 (3); 375--394
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "topic" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język