- Tytuł:
- Building semantic user profile for polish web news portal
- Autorzy:
- Misztal-Radecka, J.
- Powiązania:
- https://bibliotekanauki.pl/articles/305619.pdf
- Data publikacji:
- 2018
- Wydawca:
- Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
- Tematy:
-
user profiling
word embeddings
topic modeling
natural language processing
gender prediction - Opis:
- The aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task –Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.
- Źródło:
-
Computer Science; 2018, 19 (3); 307--332
1508-2806
2300-7036 - Pojawia się w:
- Computer Science
- Dostawca treści:
- Biblioteka Nauki