Temat: text representation - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Bag of words and embedding text representation methods for medical article classification
Autorzy:: Cichosz, Paweł
Powiązania:: https://bibliotekanauki.pl/articles/24403007.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: text representation
text classification
bag of words
word embedding
reprezentacja tekstu
klasyfikacja tekstu
osadzanie słów
Opis:: Text classification has become a standard component of automated systematic literature review (SLR) solutions, where articles are classified as relevant or irrelevant to a particular literature study topic. Conventional machine learning algorithms for tabular data which can learn quickly from not necessarily large and usually imbalanced data with low computational demands are well suited to this application, but they require that the text data be transformed to a vector representation. This work investigates the utility of different types of text representations for this purpose. Experiments are presented using the bag of words representation and selected representations based on word or text embeddings: word2vec, doc2vec, GloVe, fastText, Flair, and BioBERT. Four classification algorithms are used with these representations: a naive Bayes classifier, logistic regression, support vector machines, and random forest. They are applied to datasets consisting of scientific article abstracts from systematic literature review studies in the medical domain and compared with the pre-trained BioBERT model fine-tuned for classification. The obtained results confirm that the choice of text representation is essential for successful text classification. It turns out that, while the standard bag of words representation is hard to beat, fastText word embeddings make it possible to achieve roughly the same level of classification quality with the added benefit of much lower dimensionality and capability of handling out-of-vocabulary words. More refined embeddings methods based on deep neural networks, while much more demanding computationally, do not appear to offer substantial advantages for the classification task. The fine-tuned BioBERT classification model performs on par with conventional algorithms when they are coupled with their best text representation methods.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 4; 603--621
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: El kitsch en los monumentos de Semana Santa en Colombia
Autorzy:: Usme, Zuly
Powiązania:: https://bibliotekanauki.pl/articles/1186540.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Warszawski. Wydział Neofilologii
Tematy:: texto cultural
religiosidad popular
estética kitsch
representación cultural
monumento eucarístico
cultural text
popular religion
kitsch aesthetic
cultural representation
Eucharistic
monument
Opis:: Se analizaron 27 monumentos eucarísticos del Jueves Santo en iglesias colombianas desde 2004 a 2017 cada dos años. En estas construcciones se identifi có la expresión de la estética kitsch por medio de un análisis iconográfico de las estructuras simbólicas ancladas a las superficies visuales de los objetos que se presentan en los monumentos y de los materiales usados en su construcción. El kitsch religioso no tiene el propósito de suscitar reflexión intelectual en el público, más bien su fin es emocional, apela al poder y al fervor de la tradición religiosa que, en el marco del Jueves Santo, lo envuelve todo. El kitsch religioso no tiene propósitos intelectuales, los objetos están ahí y su presencia es suficiente, es por esto que los monumentos del Jueves Santo son apreciados por los sentimientos profundos de devoción que generan en los asistentes. Los monumentos eucarísticos apartan de la realidad tanto al creyente cotidiano como al practicante eventual y remontan a los episodios que se narran a través de su carácter palimpsestual y la metonimia emocional y efímera.
Źródło:: Itinerarios; 2019, 29; 257-278
1507-7241
Pojawia się w:: Itinerarios
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: A case study in text mining of discussion forum posts: Classification with bag of words and global vectors
Autorzy:: Cichosz, P.
Powiązania:: https://bibliotekanauki.pl/articles/330299.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: text mining
discussion forum
text representation
document classification
word embedding
eksploracja tekstu
forum dyskusyjne
reprezentacja tekstu
klasyfikacja dokumentów
Opis:: Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an interesting and useful research direction. This work presents a case study in which selected classification algorithms are applied to posts from a Polish discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana. The utility of two different vector text representations is examined: the simple bag of words representation and the more refined embedded global vectors one. While the former is found to work well for the multinomial naive Bayes algorithm, the latter turns out more useful for other classification algorithms: logistic regression, SVMs, and random forests. The obtained results suggest that post-classification can be applied for measuring publication intensity of particular topics and, in the case of forums related to psychoactive substances, for monitoring the risk of drug-related crime.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 787-801
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Waardering van inhoud en vorm in informerende teksten
Evaluation of content and form in informing texts
Wyrażanie uznania dla treści i formy w tekstach informacyjnych
Autorzy:: Ištván, Marcel
Powiązania:: https://bibliotekanauki.pl/articles/1882581.pdf
Data publikacji:: 2015
Wydawca:: Katolicki Uniwersytet Lubelski Jana Pawła II. Towarzystwo Naukowe KUL
Tematy:: jakość tekstu
optymalizacja tekstu
odbiór tekstu
reprezentacja umysłowa
kanon
text quality
text optimization
text reception
mental representation
canon
Opis:: In dit artikel wordt een deel van een onderzoek beschreven dat gericht is op de evaluatie van tekstoptimaliserende technieken. De respondenten kregen zowel de originele teksten als verschillende gemanipuleerde versies ervan te lezen. De hier beschreven teksttypen waren een handleiding voor een digitale camera en geschreven instructies over het invullen van het jaarlijkse belastingformulier. De originele teksten kwamen in twee talenvarianten, Slowaaks en Nederlands. Ze werden zowel getoetst door moedertaalsprekers van beide talen als door studenten Nederlands. Tot de taken na het lezen voor de respondenten behoorde dat zij werden gevraagd vrij te reageren op de tekst die ze net hadden gelezen. Er is een interessant detail opgenomen bij de opmerkingen over de Nederlandse versie van de teksten die het belastingformulier vergezellen: namelijk, er waren relatief veel opmerkingen over de vraag of deze teksten de canon vervullen van een “echte” administratieve tekst. Aan de andere kant werden dit soort vragen niet gesteld in de opmerkingen over de Slowaakse teksten. Daarbij werden de Slowaakse teksten zwaarder gemanipuleerd dan de Nederlandse, omdat de originele Nederlandse teksten geschikter waren voor de eindontvanger. Een ander opvallend feit is dat de manipulatie van de Nederlandse teksten niet leidde tot statistisch significante veranderingen in de mogelijkheid van de respondenten om beter te scoren bij vragen over de feiten in de teksten, terwijl de Slowaakse respondenten beter scoren.
This article describes a part of a research aimed at evaluating text-optimizing techniques. The respondents were presented with original texts and with differently manipulated versions of them. The text types studied here were an instruction manual to a digital camera and written instructions on how to fill in the annual tax declaration form. The original texts came in two language variants, namely Slovak and Dutch. They were tested on native speakers of both languages and learners of Dutch. Among other post-reading tasks, the respondents were asked to freely comment on the text they just had read. One interesting detail came up among the comments on the Dutch version of the texts that accompany the tax form: namely, there were relatively many comments questioning if these texts fulfill the canon of a “real” administrative text. On the other hand, no such questioning comments were recorded for the Slovak texts. More interestingly, the Slovak texts were more heavily manipulated than the Dutch ones, because the original Dutch texts were more suitable for the end recipient. Another striking fact is that the manipulation of the Dutch texts did not lead to statistically significant changes in the ability of the respondents to score better in questions about the facts in the texts, whereas the Slovak respondents did score better.
Niniejszy artykuł przedstawia część badań mających na celu ocenę technik optymalizacji tekstu. Respondenci otrzymali tekst oryginalny wraz z różnymi jego wariantami. Tekstami, które zostały poddane analizie, były instrukcja obsługi kamery cyfrowej oraz pisemne instrukcje dotyczące sposobu wypełnienia formularza rocznej deklaracji podatkowej w języku słowackim oraz niderlandzkim. Teksty te zostały przetestowane na rodzimych użytkownikach wymienionych języków oraz na uczących się języka niderlandzkiego. Oprócz innych zadań następujących po przeczytaniu tekstów, respondenci zostali poproszeni, aby luźno wypowiedzieć się na temat tego, co przed chwilą przeczytali. Jeden ciekawy szczegół pojawił się podczas komentowania niderlandzkiej wersji tekstów, które towarzyszyły formularzowi podatkowemu. Ponadto wyłoniło się względnie dużo komentarzy podważających fakt, że teksty te wpisują się w kanon ,,prawdziwego” administracyjnego tekstu. Z drugiej strony, tego typu komentarze nie zostały odnotowane w wersji tekstów słowackich. Co ciekawe, wersje tekstów słowackich były bardziej zmanipulowane niż te niderlandzkie, ponieważ oryginalne teksty holenderskie okazały się nawet bardziej dogodne dla końcowego odbiorcy. Innym uderzającym faktem jest to, że zmanipulowanie tekstów niderlandzkich nie spowodowało statystycznie znaczących zmian w przypadku zdolności respondentów do tego, aby lepiej odpowiadać na pytania dotyczące faktów znajdujących się w tekstach. Słowaccy respondenci wypadli w tym wypadku lepiej.
Źródło:: Roczniki Humanistyczne; 2016, 64, 5; 15-27
0035-7707
Pojawia się w:: Roczniki Humanistyczne
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Classification of text documents by using expanded terms in Latent Semantic Analysis
Klasyfikacja dokumentów tekstowych przy użyciu rozbudowanych wyrażeń w niejawnej analizie semantycznej
Autorzy:: Śmiałkowska, B.
Gibert, M.
Powiązania:: https://bibliotekanauki.pl/articles/951041.pdf
Data publikacji:: 2013
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: text classification
information extraction
Latent Semantic Analysis
information retrieval
text representation
Opis:: In this article attention is paid to improving the quality of text document classification. The common techniques of analysis of text documents used in classification are shown and the weakness of these methods arc stressed. Discussed here is the integration of quantitative and qualitative methods, which is increasing the quality of classification. In the proposed approach the expanded terms, obtained by using information patterns are used in the Latent Semantic Analysis. Finally empirical research is presented and based upon the quality measures of the text document classification, the effectiveness of the proposed approach is proved.
W artykule skoncentrowano się na poprawie jakości klasyfikacji dokumentów tekstowych. Zostały przybliżone najpopularniejsze techniki analizy dokumentów tekstowych wykorzystywanych w klasyfikacji. Zwrócono uwagę na słabe strony opisanych technik. Omówiono możliwość integracji metod ilościowych i jakościowych analizy tekstu i jej wpływ na poprawę jakości klasyfikacji. Zaproponowano rozwiązanie, w którym rozbudowane wyrażenia otrzymane za pomocą wzorców informacyjnych są wykorzystywane w niejawnej analizie semantycznej. Ostatecznie w oparciu o miary jakości klasyfikacji dokumentów tekstowych zaprezentowano wyniki badań testowych, które potwierdzają skuteczność zaproponowanego rozwiązania.
Źródło:: Theoretical and Applied Informatics; 2013, 25, 3-4; 239-250
1896-5334
Pojawia się w:: Theoretical and Applied Informatics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Propozycja mieszanego przetwarzania półstrukturalnego modelu opisu zdarzeń z akcji ratowniczo-gaśniczych Państwowej Straży Pożarnej PSP3
Proposition of hybrid process model semi structured description of event from fire services rescues operation
Autorzy:: Mirończuk, M.
Maciak, T.
Powiązania:: https://bibliotekanauki.pl/articles/373949.pdf
Data publikacji:: 2013
Wydawca:: Centrum Naukowo-Badawcze Ochrony Przeciwpożarowej im. Józefa Tuliszkowskiego
Tematy:: eksploracja tekstu
klasyfikator Bayesa
naiwny klasyfikator Bayesa
ontologia służb ratowniczych
reprezentacja meldunków
reprezentacja przypadków zdarzeń
reprezentacja tekstu
wnioskowanie na podstawie przypadków
Bayes classifier
casebased reasoning
naive Bayes classifier
ontology for rescue service
representation of reports
text mining
text representation
Opis:: W opracowaniu przedstawiono aktualnie rozwijane reprezentacje wiedzy i sposoby opisów zdarzeń, dla systemu wnioskowania na podstawie przypadków zdarzeń służb ratowniczych Państwowej Straży Pożarnej PSP. W artykule zaproponowano sposób ich przetwarzania. Przedstawiony sposób bazuje na klasyfikacji i wyszukiwaniu opisów zdarzeń.
This paper describes a review of actual developed knowledge representation and case representation for fire services cases based reasoning system. The article also describes a method of processing the cases of events. This processing method based on classification and information retrieval.
Źródło:: Bezpieczeństwo i Technika Pożarnicza; 2013, 1; 95-106
1895-8443
Pojawia się w:: Bezpieczeństwo i Technika Pożarnicza
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Projekt "Świat. Fotografi e dzieci z Jasionki i Krzywej" jako tekst zmącony
Project "Świat. Fotografie dzieci z Jasionki i Krzywej" as a blurred text
Autorzy:: Roszczynialska, Magdalena
Powiązania:: https://bibliotekanauki.pl/articles/521088.pdf
Data publikacji:: 2011
Wydawca:: Uniwersytet Pedagogiczny im. Komisji Edukacji Narodowej w Krakowie
Tematy:: ideology
collaborative photography
ethical aspects of representation
blurred text
photography in literature
Opis:: The text refers to ideological aspects of the collaborative photography project Świat. Fotografie dzieci z Jasionki i Krzywej. Communication ambiguity and plurality of authors and recipients of the text cause that it is a so called ”blurred genre”, in which the meaning is inconsistent.
Źródło:: Annales Universitatis Paedagogicae Cracoviensis. Studia de Cultura; 2011, 2; 67-80
2083-7275
Pojawia się w:: Annales Universitatis Paedagogicae Cracoviensis. Studia de Cultura
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Images performed by words in Howard Jacobson’n novel Kalooki Nights. Remarks about intersemiotic relations among words and pictures
Autorzy:: Kaźmierczak, Marek
Powiązania:: https://bibliotekanauki.pl/articles/920244.pdf
Data publikacji:: 2009-06-13
Wydawca:: Uniwersytet im. Adama Mickiewicza w Poznaniu
Tematy:: Shoah
Holocaust
transframing
performing
literature
text
novel
representation
images
words
interpretation
Kalooki Nights
intersemiotic tension
Opis:: Images performed by words in Howard Jacobson’s novel “Kalooki Nights”. Remarks about intersemiotic relations among words and picturesThe article titled Images performed by words in Howard Jacobson’n novel Kalooki Nights. Remarks about intersemiotic relations among words and pictures concerns the intersemiotic tensions among words and pictures. The theoretical model is supported by the interpretation of Howard Jacobson’s novel titled Kalooki Nights. This novel redefines the limits of representation and reception of the Holocaust in the context of identity and contemporary world. Exploring the forms of “reading” (“looking at”) of the linguistic codes as the visual codes, such terms like “transframing” and “performing” refer to the patterns of creation the fictional worlds (constructed by the words which are treated as the images). The being of words still means looking through them, through their semantic flaws. This intersemiotic translations are rooted in the will of creation (Eros) and the will of destruction (Thanatos). The history of interpretations of these two sources of semiosphere touches the limits of questions: what can be shown in written wor(l).
Źródło:: Images. The International Journal of European Film, Performing Arts and Audiovisual Communication; 2009, 7, 13-14; 325-336
1731-450X
Pojawia się w:: Images. The International Journal of European Film, Performing Arts and Audiovisual Communication
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Text classification using word sequences
Autorzy:: Chudzian, P.
Powiązania:: https://bibliotekanauki.pl/articles/92904.pdf
Data publikacji:: 2008
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: text classification
text representation
generalized suffix tree
Opis:: The article discusses the use of word sequences in text classification. As opposed to ngrams, word sequences are not of a fixed length and therefore allow the classifier to obtain flexibility necessary to operate on documents collected from various sources. Presented classifier is built upon the suffix tree structure which enables word sequences to take part in classification process. During classification, both single words and longer sequences are taken into account and have impact on the category assignment with respect to their frequency and length. The Suffix Tree Classifier and well known Naive Bayes Classifier are compared and their properties are discussed. Obtained results show that incorporating word sequences into text classification can increase accuracy and reveal some interesting relations between maximal length of used sequences and classifier's error rate.
Źródło:: Studia Informatica : systems and information technology; 2008, 1(10); 75-85
1731-2264
Pojawia się w:: Studia Informatica : systems and information technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "text representation" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język