Temat: klasyfikacja tekstu - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Wykorzystanie akceleracji sprzętowej przy implementacji metryk podobieństwa tekstów
The use of a hardware accelerator for implementation of text resemblance metrics
Autorzy:: Iwanecki, Ł.
Koryciak, S.
Dąbrowska-Boruch, A.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/157430.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: akceleracja sprzętowa
FPGA
ARM
klasyfikacja tekstu
hardware acceleration
text classification
Opis:: Artykuł opisuje badania na temat klasyfikatorów tekstów. Zadanie polegało na zaprojektowaniu akceleratora sprzętowego, który przyspieszyłby proces klasyfikacji tekstów pod względem znaczeniowym. Projekt został podzielony na dwie części. Celem części pierwszej było zaproponowanie sprzętowej implementacji algorytmu realizującego metrykę do obliczania podobieństwa dokumentów. W drugiej części zaprojektowany został cały systemem akceleratora sprzętowego. Kolejnym etapem projektowym jest integracja modelu metryki z system akceleracji.
The aim of this project is to propose a hardware accelerating system to improve the text categorization process. Text categorization is a task of categorizing electronic documents into the predefined groups, based on the content. This process is complex and requires a high performance computing system and a big number of comparisons. In this document, there is suggested a method to improve the text categorization using the FPGA technology. The main disadvantage of common processing systems is that they are single-threaded – it is possible to execute only one instruction per a single time unit. The FPGA technology improves concurrence. In this case, hundreds of big numbers may be compared in one clock cycle. The whole project is divided into two independent parts. Firstly, a hardware model of the required metrics is implemented. There are two useful metrics to compute a distance between two texts. Both of them are shown as equations (1) and (2). These formulas are similar to each other and the only difference is the denominator. This part results in two hardware models of the presented metrics. The main purpose of the second part of the project is to design a hardware accelerating system. The system is based on a Xilinx Zynq device. It consists of a Cortex-A9 ARM processor, a DMA controller and a dedicated IP Core with the accelerator. The block diagram of the system is presented in Fig.4. The DMA controller provides duplex transmission from the DDR3 memory to the accelerating unit omitting a CPU. The project is still in development. The last step is to integrate the hardware metrics model with the accelerating system.
Źródło:: Pomiary Automatyka Kontrola; 2014, R. 60, nr 7, 7; 426-428
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Bag of words and embedding text representation methods for medical article classification
Autorzy:: Cichosz, Paweł
Powiązania:: https://bibliotekanauki.pl/articles/24403007.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: text representation
text classification
bag of words
word embedding
reprezentacja tekstu
klasyfikacja tekstu
osadzanie słów
Opis:: Text classification has become a standard component of automated systematic literature review (SLR) solutions, where articles are classified as relevant or irrelevant to a particular literature study topic. Conventional machine learning algorithms for tabular data which can learn quickly from not necessarily large and usually imbalanced data with low computational demands are well suited to this application, but they require that the text data be transformed to a vector representation. This work investigates the utility of different types of text representations for this purpose. Experiments are presented using the bag of words representation and selected representations based on word or text embeddings: word2vec, doc2vec, GloVe, fastText, Flair, and BioBERT. Four classification algorithms are used with these representations: a naive Bayes classifier, logistic regression, support vector machines, and random forest. They are applied to datasets consisting of scientific article abstracts from systematic literature review studies in the medical domain and compared with the pre-trained BioBERT model fine-tuned for classification. The obtained results confirm that the choice of text representation is essential for successful text classification. It turns out that, while the standard bag of words representation is hard to beat, fastText word embeddings make it possible to achieve roughly the same level of classification quality with the added benefit of much lower dimensionality and capability of handling out-of-vocabulary words. More refined embeddings methods based on deep neural networks, while much more demanding computationally, do not appear to offer substantial advantages for the classification task. The fine-tuned BioBERT classification model performs on par with conventional algorithms when they are coupled with their best text representation methods.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 4; 603--621
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Speech command based application enabling Internet navigation
Aplikacja umożliwiająca nawigację w Internecie za pomocą poleceń mowy
Autorzy:: Mięsikowska, M.
Powiązania:: https://bibliotekanauki.pl/articles/152861.pdf
Data publikacji:: 2007
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: rozpoznawanie mowy
klasyfikacja tekstu
przeglądanie informacji
speech recognition
text classification
information retrieval
Opis:: The paper presents an attempt to create an application enabling the user to surf much easier the resources of the Internet with the help of voice commands, as well as to classify and arrange the browsed information. The application has two basic modules which enable browsing the information on the Internet. The first navigation module processes websites, isolates navigation elements , such as links to other websites, from them and gives an identification name to the elements, which enables the user to pronounce voice commands. The website is presented to the user in a practically original form. The second module also processes websites, isolating navigation elements from them. The only difference in operation of the both modules is the mode of processing the website and its final presentation. The second module isolates from the elements vocabulary, which makes it possible to classify the information included in the website, this way acquiring and displaying, an ordered set of navigation elements. The application was implemented in Java language with the use of Oracle software. For the system of recognition and understanding of speech the Sphinx 4 tool was used.
W tej pracy przedstawiono próbę stworzenia aplikacji umożliwiającej swobodniejszą nawigację użytkownika wśród zasobów Internetu za pomocą poleceń mowy, klasyfikację oraz uporządkowanie przeglądanej informacji. Aplikacja posiada dwa zasadnicze moduły, przy pomocy których możliwe jest przeglądanie informacji w Internecie. Pierwszy moduł nawigacji, przetwarza strony internetowe, wyodrębnia z nich elementy nawigacyjne takie jak odnośniki do innych stron, oraz nadaje elementom identyfikacyjną nazwę, dzięki której użytkownik może wydawać słowne polecenia. Strona internetowa wyświetlona zostaje użytkownikowi w niemalże oryginalnej postaci. Drugi moduł również przetwarza strony internetowe, wyodrębniając z nich elementy nawigacyjne. Jedyną różnicą w działaniu obu modułów jest sposób przetwarzania strony i ostatecznej jej reprezentacji. Drugi moduł wyodrębnia z elementów słownictwo, dzięki któremu możemy sklasyfikować informację znajdującą się na stronie, uzyskując i wyświetlając w ten sposób uporządkowany zbiór elementów nawigacyjnych. Aplikacja zaimplementowana została w języku Java z wykorzystaniem oprogramowania Oracle. W przypadku systemu rozpoznawania mowy zastosowano narzędzie Sphinx-4.
Źródło:: Pomiary Automatyka Kontrola; 2007, R. 53, nr 5, 5; 87-89
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: A contemporary multi-objective feature selection model for depression detection using a hybrid pBGSK optimization algorithm
Autorzy:: Kavi Priya, Santhosam
Pon Karthika, Kasirajan
Powiązania:: https://bibliotekanauki.pl/articles/2201021.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: depression detection
text classification
dimensionality reduction
hybrid feature selection
wykrywanie depresji
klasyfikacja tekstu
redukcja wymiarowości
wybór funkcji
Opis:: Depression is one of the primary causes of global mental illnesses and an underlying reason for suicide. The user generated text content available in social media forums offers an opportunity to build automatic and reliable depression detection models. The core objective of this work is to select an optimal set of features that may help in classifying depressive contents posted on social media. To this end, a novel multi-objective feature selection technique (EFS-pBGSK) and machine learning algorithms are employed to train the proposed model. The novel feature selection technique incorporates a binary gaining-sharing knowledge-based optimization algorithm with population reduction (pBGSK) to obtain the optimized features from the original feature space. The extensive feature selector (EFS) is used to filter out the excessive features based on their ranking. Two text depression datasets collected from Twitter and Reddit forums are used for the evaluation of the proposed feature selection model. The experimentation is carried out using naive Bayes (NB) and support vector machine (SVM) classifiers for five different feature subset sizes (10, 50, 100, 300 and 500). The experimental outcome indicates that the proposed model can achieve superior performance scores. The top results are obtained using the SVM classifier for the SDD dataset with 0.962 accuracy, 0.929 F1 score, 0.0809 log-loss and 0.0717 mean absolute error (MAE). As a result, the optimal combination of features selected by the proposed hybrid model significantly improves the performance of the depression detection system.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 1; 117--131
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "klasyfikacja tekstu" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język