Temat: klasyfikacja tekstu - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Wykorzystanie akceleracji sprzętowej przy implementacji metryk podobieństwa tekstów
The use of a hardware accelerator for implementation of text resemblance metrics
Autorzy:: Iwanecki, Ł.
Koryciak, S.
Dąbrowska-Boruch, A.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/157430.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: akceleracja sprzętowa
FPGA
ARM
klasyfikacja tekstu
hardware acceleration
text classification
Opis:: Artykuł opisuje badania na temat klasyfikatorów tekstów. Zadanie polegało na zaprojektowaniu akceleratora sprzętowego, który przyspieszyłby proces klasyfikacji tekstów pod względem znaczeniowym. Projekt został podzielony na dwie części. Celem części pierwszej było zaproponowanie sprzętowej implementacji algorytmu realizującego metrykę do obliczania podobieństwa dokumentów. W drugiej części zaprojektowany został cały systemem akceleratora sprzętowego. Kolejnym etapem projektowym jest integracja modelu metryki z system akceleracji.
The aim of this project is to propose a hardware accelerating system to improve the text categorization process. Text categorization is a task of categorizing electronic documents into the predefined groups, based on the content. This process is complex and requires a high performance computing system and a big number of comparisons. In this document, there is suggested a method to improve the text categorization using the FPGA technology. The main disadvantage of common processing systems is that they are single-threaded – it is possible to execute only one instruction per a single time unit. The FPGA technology improves concurrence. In this case, hundreds of big numbers may be compared in one clock cycle. The whole project is divided into two independent parts. Firstly, a hardware model of the required metrics is implemented. There are two useful metrics to compute a distance between two texts. Both of them are shown as equations (1) and (2). These formulas are similar to each other and the only difference is the denominator. This part results in two hardware models of the presented metrics. The main purpose of the second part of the project is to design a hardware accelerating system. The system is based on a Xilinx Zynq device. It consists of a Cortex-A9 ARM processor, a DMA controller and a dedicated IP Core with the accelerator. The block diagram of the system is presented in Fig.4. The DMA controller provides duplex transmission from the DDR3 memory to the accelerating unit omitting a CPU. The project is still in development. The last step is to integrate the hardware metrics model with the accelerating system.
Źródło:: Pomiary Automatyka Kontrola; 2014, R. 60, nr 7, 7; 426-428
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Speech command based application enabling Internet navigation
Aplikacja umożliwiająca nawigację w Internecie za pomocą poleceń mowy
Autorzy:: Mięsikowska, M.
Powiązania:: https://bibliotekanauki.pl/articles/152861.pdf
Data publikacji:: 2007
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: rozpoznawanie mowy
klasyfikacja tekstu
przeglądanie informacji
speech recognition
text classification
information retrieval
Opis:: The paper presents an attempt to create an application enabling the user to surf much easier the resources of the Internet with the help of voice commands, as well as to classify and arrange the browsed information. The application has two basic modules which enable browsing the information on the Internet. The first navigation module processes websites, isolates navigation elements , such as links to other websites, from them and gives an identification name to the elements, which enables the user to pronounce voice commands. The website is presented to the user in a practically original form. The second module also processes websites, isolating navigation elements from them. The only difference in operation of the both modules is the mode of processing the website and its final presentation. The second module isolates from the elements vocabulary, which makes it possible to classify the information included in the website, this way acquiring and displaying, an ordered set of navigation elements. The application was implemented in Java language with the use of Oracle software. For the system of recognition and understanding of speech the Sphinx 4 tool was used.
W tej pracy przedstawiono próbę stworzenia aplikacji umożliwiającej swobodniejszą nawigację użytkownika wśród zasobów Internetu za pomocą poleceń mowy, klasyfikację oraz uporządkowanie przeglądanej informacji. Aplikacja posiada dwa zasadnicze moduły, przy pomocy których możliwe jest przeglądanie informacji w Internecie. Pierwszy moduł nawigacji, przetwarza strony internetowe, wyodrębnia z nich elementy nawigacyjne takie jak odnośniki do innych stron, oraz nadaje elementom identyfikacyjną nazwę, dzięki której użytkownik może wydawać słowne polecenia. Strona internetowa wyświetlona zostaje użytkownikowi w niemalże oryginalnej postaci. Drugi moduł również przetwarza strony internetowe, wyodrębniając z nich elementy nawigacyjne. Jedyną różnicą w działaniu obu modułów jest sposób przetwarzania strony i ostatecznej jej reprezentacji. Drugi moduł wyodrębnia z elementów słownictwo, dzięki któremu możemy sklasyfikować informację znajdującą się na stronie, uzyskując i wyświetlając w ten sposób uporządkowany zbiór elementów nawigacyjnych. Aplikacja zaimplementowana została w języku Java z wykorzystaniem oprogramowania Oracle. W przypadku systemu rozpoznawania mowy zastosowano narzędzie Sphinx-4.
Źródło:: Pomiary Automatyka Kontrola; 2007, R. 53, nr 5, 5; 87-89
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Bag of words and embedding text representation methods for medical article classification
Autorzy:: Cichosz, Paweł
Powiązania:: https://bibliotekanauki.pl/articles/24403007.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: text representation
text classification
bag of words
word embedding
reprezentacja tekstu
klasyfikacja tekstu
osadzanie słów
Opis:: Text classification has become a standard component of automated systematic literature review (SLR) solutions, where articles are classified as relevant or irrelevant to a particular literature study topic. Conventional machine learning algorithms for tabular data which can learn quickly from not necessarily large and usually imbalanced data with low computational demands are well suited to this application, but they require that the text data be transformed to a vector representation. This work investigates the utility of different types of text representations for this purpose. Experiments are presented using the bag of words representation and selected representations based on word or text embeddings: word2vec, doc2vec, GloVe, fastText, Flair, and BioBERT. Four classification algorithms are used with these representations: a naive Bayes classifier, logistic regression, support vector machines, and random forest. They are applied to datasets consisting of scientific article abstracts from systematic literature review studies in the medical domain and compared with the pre-trained BioBERT model fine-tuned for classification. The obtained results confirm that the choice of text representation is essential for successful text classification. It turns out that, while the standard bag of words representation is hard to beat, fastText word embeddings make it possible to achieve roughly the same level of classification quality with the added benefit of much lower dimensionality and capability of handling out-of-vocabulary words. More refined embeddings methods based on deep neural networks, while much more demanding computationally, do not appear to offer substantial advantages for the classification task. The fine-tuned BioBERT classification model performs on par with conventional algorithms when they are coupled with their best text representation methods.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 4; 603--621
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: A contemporary multi-objective feature selection model for depression detection using a hybrid pBGSK optimization algorithm
Autorzy:: Kavi Priya, Santhosam
Pon Karthika, Kasirajan
Powiązania:: https://bibliotekanauki.pl/articles/2201021.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: depression detection
text classification
dimensionality reduction
hybrid feature selection
wykrywanie depresji
klasyfikacja tekstu
redukcja wymiarowości
wybór funkcji
Opis:: Depression is one of the primary causes of global mental illnesses and an underlying reason for suicide. The user generated text content available in social media forums offers an opportunity to build automatic and reliable depression detection models. The core objective of this work is to select an optimal set of features that may help in classifying depressive contents posted on social media. To this end, a novel multi-objective feature selection technique (EFS-pBGSK) and machine learning algorithms are employed to train the proposed model. The novel feature selection technique incorporates a binary gaining-sharing knowledge-based optimization algorithm with population reduction (pBGSK) to obtain the optimized features from the original feature space. The extensive feature selector (EFS) is used to filter out the excessive features based on their ranking. Two text depression datasets collected from Twitter and Reddit forums are used for the evaluation of the proposed feature selection model. The experimentation is carried out using naive Bayes (NB) and support vector machine (SVM) classifiers for five different feature subset sizes (10, 50, 100, 300 and 500). The experimental outcome indicates that the proposed model can achieve superior performance scores. The top results are obtained using the SVM classifier for the SDD dataset with 0.962 accuracy, 0.929 F1 score, 0.0809 log-loss and 0.0717 mean absolute error (MAE). As a result, the optimal combination of features selected by the proposed hybrid model significantly improves the performance of the depression detection system.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 1; 117--131
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: A case study in text mining of discussion forum posts: Classification with bag of words and global vectors
Autorzy:: Cichosz, P.
Powiązania:: https://bibliotekanauki.pl/articles/330299.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: text mining
discussion forum
text representation
document classification
word embedding
eksploracja tekstu
forum dyskusyjne
reprezentacja tekstu
klasyfikacja dokumentów
Opis:: Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an interesting and useful research direction. This work presents a case study in which selected classification algorithms are applied to posts from a Polish discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana. The utility of two different vector text representations is examined: the simple bag of words representation and the more refined embedded global vectors one. While the former is found to work well for the multinomial naive Bayes algorithm, the latter turns out more useful for other classification algorithms: logistic regression, SVMs, and random forests. The obtained results suggest that post-classification can be applied for measuring publication intensity of particular topics and, in the case of forums related to psychoactive substances, for monitoring the risk of drug-related crime.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 787-801
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Application of explainable artificial intelligence in software bug classification
Zastosowanie wyjaśnialnej sztucznej inteligencji w klasyfikacji usterek oprogramowania
Autorzy:: Chmielowski, Łukasz
Kucharzak, Michał
Burduk, Robert
Powiązania:: https://bibliotekanauki.pl/articles/27315369.pdf
Data publikacji:: 2023
Wydawca:: Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:: software bug assignment
software bug triaging
explainable artificial intelligence
text analysis
vulnerability
przypisywanie usterek oprogramowania
klasyfikacja usterek oprogramowania
wyjaśnialna sztuczna inteligencja
analiza tekstu
podatności
Opis:: Fault management is an expensive process and analyzing data manually requires a lot of resources. Modern software bug tracking systems may be armed with automated bug report assignment functionality that facilitates bug classification or bug assignment to proper development group.For supporting decision systems, it would be beneficial to introduce information related to explainability. The purpose of this work is to evaluate the useof explainable artificial intelligence (XAI) in processes related to software development and bug classification based on bug reports created by either software testers or software users. The research was conducted on two different datasets. The first one is related to classification of security vs non-securitybug reports. It comes from a telecommunication company which develops software and hardware solutions for mobile operators. The second dataset contains a list of software bugs taken from an opensource project. In this dataset the task is to classify issues with one of following labels crash, memory, performance, and security. Studies on XAI-related algorithms show that there are no major differences in the results of the algorithms used when comparing them with others. Therefore, not only the users can obtain results with possible explanations or experts can verify model or its part before introducing into production, but also it does not provide degradation of accuracy. Studies showed that it could be put into practice, but it has not been done so far.
Zarządzanie usterkami jest kosztownym procesem, a ręczna analiza danych wymaga znacznych zasobów. Nowoczesne systemy zarządzania usterkami w oprogramowaniu mogą być wyposażone w funkcję automatycznego przypisywania usterek, która ułatwia klasyfikację ustereklub przypisywanie usterek do właściwej grupy programistów. Dla wsparcia systemów decyzyjnych korzystne byłoby wprowadzenie informacji związanychz wytłumaczalnością. Celem tej pracy jest ocena możliwości wykorzystania wyjaśnialnej sztucznej inteligencji (XAI) w procesach związanych z tworzeniem oprogramowania i klasyfikacją usterek na podstawie raportów o usterkach tworzonych przez testerów oprogramowania lub użytkowników oprogramowania. Badania przeprowadzono na dwóch różnych zbiorach danych. Pierwszy z nich związany jest z klasyfikacją raportów o usterkach związanych z bezpieczeństwem i niezwiązanych z bezpieczeństwem. Dane te pochodzą od firmy telekomunikacyjnej, która opracowuje rozwiązania programowe i sprzętowe dla operatorów komórkowych. Drugi zestaw danych zawiera listę usterek oprogramowania pobranych z projektu opensource.W tym zestawie danych zadanie polega na sklasyfikowaniu problemów za pomocą jednej z następujących etykiet: awaria, pamięć, wydajnośći bezpieczeństwo. Badania przeprowadzone przy użyciu algorytmów związanych z XAI pokazują, że nie ma większych różnic w wynikach algorytmów stosowanych przy porównywaniu ich z innymi. Dzięki temu nie tylko użytkownicy mogą uzyskać wyniki z ewentualnymi wyjaśnieniami lub eksperci mogą zweryfikować model lub jego część przed wprowadzeniem do produkcji, ale także nie zapewnia to degradacji dokładności. Badania wykazały, że możnato zastosować w praktyce, ale do tej pory tego nie zrobiono.
Źródło:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2023, 13, 1; 14--17
2083-0157
2391-6761
Pojawia się w:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Analysis of methods and means of text mining
Autorzy:: Rybchak, Z.
Basystiuk, O.
Powiązania:: https://bibliotekanauki.pl/articles/411072.pdf
Data publikacji:: 2017
Wydawca:: Polska Akademia Nauk. Oddział w Lublinie PAN
Tematy:: text mining
text analytics
data analysis
high-quality information
text categorization
text clustering
document summarization
sentiment analysis
sieć językowa
analiza tekstu
analiza danych
wysoka jakość informacji
klasyfikacja tekstowa
kategoryzacja tekstowa
grupowanie tekstu
streszczenie dokumentów tekstowych
technika sentiment analysis
Opis:: In Big Data era when data volume doubled every year analyzing of all this data become really complicated task, so in this case text mining systems, techniques and tools become main instrument of analyzing tones and tones of information, selecting that information that suit the best for your needs and just help save your time for more interesting thing. The main aims of this article are explain basic principles of this field and overview some interesting technologies that nowadays are widely used in text mining.
Źródło:: ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes; 2017, 6, 2; 73-78
2084-5715
Pojawia się w:: ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: New algorithm for determining the number of features for the effective sentiment-classification of text documents
Nowy algorytm ustalania liczby zmiennych potrzebnych do klasyfikacji dokumentów tekstowych ze względu na ich wydźwięk emocjonalny
Autorzy:: Idczak, Adam
Korzeniewski, Jerzy
Powiązania:: https://bibliotekanauki.pl/articles/18105028.pdf
Data publikacji:: 2023-05-31
Wydawca:: Główny Urząd Statystyczny
Tematy:: sentiment analysis
document sentiment classification
text mining
logistic regression
naive Bayes classifier
feature selection
correlation
analiza sentymentu
klasyfikacja dokumentów ze względu na wydźwięk emocjonalny
eksploracja tekstu
regresja logistyczna
naiwny klasyfikator Bayesa
dobór cech
korelacja
Opis:: Sentiment analysis of text documents is a very important part of contemporary text mining. The purpose of this article is to present a new technique of text sentiment analysis which can be used with any type of a document-sentiment-classification method. The proposed technique involves feature selection independently of a classifier, which reduces the size of the feature space. Its advantages include intuitiveness and computational noncomplexity. The most important element of the proposed technique is a novel algorithm for the determination of the number of features to be selected sufficient for the effective classification. The algorithm is based on the analysis of the correlation between single features and document labels. A statistical approach, featuring a naive Bayes classifier and logistic regression, was employed to verify the usefulness of the proposed technique. They were applied to three document sets composed of 1,169 opinions of bank clients, obtained in 2020 from a Poland-based bank. The documents were written in Polish. The research demonstrated that reducing the number of terms over 10-fold by means of the proposed algorithm in most cases improves the effectiveness of classification.
Analiza sentymentu, czyli wydźwięku emocjonalnego, dokumentów tekstowych stanowi bardzo ważną część współczesnej eksploracji tekstu (ang. text mining). Celem artykułu jest przedstawienie nowej techniki analizy sentymentu tekstu, która może znaleźć zastosowanie w dowolnej metodzie klasyfikacji dokumentów ze względu na ich wydźwięk emocjonalny. Proponowana technika polega na niezależnym od klasyfikatora doborze cech, co skutkuje zmniejszeniem rozmiaru ich przestrzeni. Zaletami tej propozycji są intuicyjność i prostota obliczeniowa. Zasadniczym elementem omawianej techniki jest nowatorski algorytm ustalania liczby terminów wystarczających do efektywnej klasyfikacji, który opiera się na analizie korelacji pomiędzy pojedynczymi cechami dokumentów a ich wydźwiękiem. W celu weryfikacji przydatności proponowanej techniki zastosowano podejście statystyczne. Wykorzystano dwie metody: naiwny klasyfikator Bayesa i regresję logistyczną. Za ich pomocą zbadano trzy zbiory dokumentów składające się z 1169 opinii klientów jednego z banków działających na terenie Polski uzyskanych w 2020 r. Dokumenty zostały napisane w języku polskim. Badanie pokazało, że kilkunastokrotne zmniejszenie liczby terminów przy zastosowaniu proponowanej techniki na ogół poprawia jakość klasyfikacji.
Źródło:: Wiadomości Statystyczne. The Polish Statistician; 2023, 68, 5; 40-57
0043-518X
Pojawia się w:: Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "klasyfikacja tekstu" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język