Temat: Text data - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Cluo: web-scale text mining system for open source intelligence purposes
Autorzy:: Maciołek, P.
Dobrowolski, G.
Powiązania:: https://bibliotekanauki.pl/articles/305361.pdf
Data publikacji:: 2013
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: text mining
big data
OSINT
natural language processing
monitoring
Opis:: The amount of textual information published on the Internet is considered to be in billions of web pages, blog posts, comments, social media updates and others. Analyzing such quantities of data requires high level of distribution – both data and computing. This is especially true in case of complex algorithms, often used in text mining tasks. The paper presents a prototype implementation of CLUO – an Open Source Intelligence (OSINT) system, which extracts and analyzes significant quantities of openly available information.
Źródło:: Computer Science; 2013, 14 (1); 45-62
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Analysis of data pre-processing methods for sentiment analysis of reviews
Autorzy:: Parlar, Tuba
Ozel, Selma
Song, Fei
Powiązania:: https://bibliotekanauki.pl/articles/305513.pdf
Data publikacji:: 2019
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: data pre-processing
feature selection
sentiment analysis
text classification
Opis:: The goals of this study are to analyze the effects of data pre-processing methods for sentiment analysis and determine which of these pre-processing methods (and their combinations) are effective for English as well as for an agglutinative language like Turkish. We also try to answer the research question of whether there are any differences between agglutinative and non-agglutinative languages in terms of pre-processing methods for sentiment analysis. We find that the performance results for the English reviews are generally higher than those for the Turkish reviews due to the differences between the two languages in terms of vocabularies, writing styles, and agglutinative property of the Turkish language.
Źródło:: Computer Science; 2019, 20 (1); 123-141
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Impact of n-stage latent Dirichlet allocation on analysis of headline classification
Autorzy:: Guven, Zekeriya Anil
Diri, Banu
Cakaloglu, Tolgahan
Powiązania:: https://bibliotekanauki.pl/articles/27312901.pdf
Data publikacji:: 2022
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: topic modeling
headline classification
machine learning
text classification
latent Dirichlet allocation
data analysis
Opis:: Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help overcome this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we benchmark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classification. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature extractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.
Źródło:: Computer Science; 2022, 23 (3); 375--394
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccine
Autorzy:: Bansal, Anmol
Choudhry, Arjun
Sharma, Anubhav
Susan, Seba
Powiązania:: https://bibliotekanauki.pl/articles/27312860.pdf
Data publikacji:: 2023
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: Covid-19
vaccine
transformer
Twitter
BERTweet
CT-BERT
BERT
XLNet
RoBERTa
text oversampling
LMOTE
class imbalance
small sample data set
Opis:: Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pretrained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art RoBERTa, XLNet, and BERT pre-trained transformer models, and the domain-specific CT-BER and BERTweet transformer models that have been pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using the language model-based oversampling technique (LMOTE) to improve the accuracies of these models - specifically, for small sample data sets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small-sample data sets that are used to fine-tune state-of-the-art pre-trained transformer models as well as the utility of domain-specific transformer models for the classification task.
Źródło:: Computer Science; 2023, 24 (2); 163--182
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "Text data" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język