Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "natural language processing" wg kryterium: Wszystkie pola


Tytuł:
Benchmarking high performance architectures with natural language processing algorithms
Benchmarking architektur wysokiej wydajności algorytmami przetwarzania języka naturalnego
Autorzy:
Kuta, M.
Kitowski, J.
Powiązania:
https://bibliotekanauki.pl/articles/305469.pdf
Data publikacji:
2011
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
benchmarking
tagowanie częściami mowy
klasteryzacja dokumentów
przetwarzanie języka naturalnego
architektury wysokiej wydajności
part-of-speech tagging
document clustering
natural language processing
high performance architectures
Opis:
Natural Language Processing algorithms are resource demanding, especially when tuning to inflective language like Polish is needed. The paper presents time and memory requirements of part of speech tagging and clustering algorithms applied to two corpora of the Polish language. The algorithms are benchmarked on three high performance platforms of different architectures. Additionally sequential versions and OpenMP implementations of clustering algorithms were compared.
Algorytmy przetwarzania języka naturalnego mają duże zapotrzebowanie na zasoby komputerowe, szczególnie gdy wymagane jest dostosowanie algorytmu do języka fleksyjnego jakim jest np. język polski. Artykuł przedstawia wymagania czasowe i pamięciowe algorytmów tagowania częściami mowy oraz algorytmów klasteryzacji zastosowanych do dwóch korpusów języka polskiego. Dokonano benchmarkingu algorytmów na trzech platformach wysokiej wydajności reprezentujących różne architektury. Dodatkowo porównano wersję sekwencyjną oraz implementacje OpenMP algorytmów klasteryzacji.
Źródło:
Computer Science; 2011, 12; 19-31
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Sémantique lexicale et corpus : l’étude du lexique transdisciplinaire des écrits scientifiques
Autorzy:
Tutin, Agnès
Powiązania:
https://bibliotekanauki.pl/articles/605411.pdf
Data publikacji:
2008
Wydawca:
Uniwersytet Marii Curie-Skłodowskiej. Wydawnictwo Uniwersytetu Marii Curie-Skłodowskiej
Tematy:
corpus linguistics
natural language processing
Opis:
This paper deals with a corpus-based linguistic study in lexical semantics. Our topic is the general scientific lexicon, the cross-disciplinary lexicon peculiar to the academic genre. We show how the use of a large corpus enables to develop an inventory of this vocabulary and present the first semantic treatments performed with the help of the corpus, with a first experiment in natural language processing..
Źródło:
Lublin Studies in Modern Languages and Literature; 2008, 32; 242-260
0137-4699
Pojawia się w:
Lublin Studies in Modern Languages and Literature
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Traitement automatique de la polysémie
Machine-made treatment of polysemy
Autorzy:
Gross, Gaston
Powiązania:
https://bibliotekanauki.pl/articles/1048808.pdf
Data publikacji:
2015-01-01
Wydawca:
Uniwersytet im. Adama Mickiewicza w Poznaniu
Tematy:
polysemy
natural language processing
lexicon-grammar
Opis:
It has been an empirical fact that almost all the words are polysemous. A standard dictionary such as the Petit Robert lists 60,000 entries which correspond to 300,000 meanings. Thus, in this particular dictionary one word is paired with five different senses on average. Moreover, what is being dealt with here is no more than a general reference work, designed for a daily use. It contains only most frequent and general items, but disregards all other available meanings. In what follows, contextual properties will be demonstrated to be able to set apart each instance of polysemy, thus offering an effectual tool likely to do away with ambiguities.
Źródło:
Studia Romanica Posnaniensia; 2015, 42, 1; 15-33
0137-2475
2084-4158
Pojawia się w:
Studia Romanica Posnaniensia
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Fantinuoli, Claudio (Hg.) (2018): Interpreting and technology. (Translation and Multilingual Natural Language Processing 11). Berlin: Language Science Press. 149 S.
Autorzy:
Ustaszewski, Michael
Powiązania:
https://bibliotekanauki.pl/articles/1191732.pdf
Data publikacji:
2020
Wydawca:
Uniwersytet Wrocławski. Oficyna Wydawnicza ATUT – Wrocławskie Wydawnictwo Oświatowe
Źródło:
Studia Translatorica; 2020, 11; 212-218
2084-3321
2657-4802
Pojawia się w:
Studia Translatorica
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Specialized fully automatic machine translation system delivering high quality of translated texts
Autorzy:
Gajer, M.
Powiązania:
https://bibliotekanauki.pl/articles/1943201.pdf
Data publikacji:
2009
Wydawca:
Politechnika Gdańska
Tematy:
natural language processing
machine translation
translation patterns
Opis:
The paper concerns machine translation systems that form a discipline of computer science and are aimed at writing computer programs that are able to translate text between natural languages. In the paper the author argues that it is not possible to build a machine translation system that would be able to translate any kind of documents with a sufficiently high quality. Instead, the author proposes a specialized machine translation system the aim of which is to translate financial reports concerning the global currency exchange market – forex. For the purpose of building the above mentioned system, the author has proposed his own machine translation method of translation patterns. The translation patterns allow transferring the translation process from the level of single words to the level of words chunks. The translation patterns play a very important role in the case of such an inflectional language as Polish because they make it possible to choose the correct form of Polish translation of foreign phrases depending whether they perform the verb or object function in the sentence. The high quality of the specialized machine translation system developed by the author was proved with many experiments the results of which are demonstrated in the paper. The quality of translation is so high that the Polish translations of English reports from the global currency exchange market can be published on Web pages without any additional changes. Thus, it is possible to totally eliminate the human translator from the process of translation of texts which are highly stereotypical and oriented to a selected and narrow domain.
Źródło:
TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2009, 13, 4; 347-354
1428-6394
Pojawia się w:
TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The Implementation of the example-based machine translation technique for French-to-Polish automatic translation system
Autorzy:
Gajer, M.
Powiązania:
https://bibliotekanauki.pl/articles/1986914.pdf
Data publikacji:
2002
Wydawca:
Politechnika Gdańska
Tematy:
natural language processing
computational linguistics
machine translation
Opis:
High-quality machine translation between human languages has for a long time been an unattainable dream for many computer scientists involved in this fascinating and interdisciplinary field of computer application. The developed quite recently example-based machine translation technique seems to be a serious alternative to the existing automatic translation techniques. In the paper the usage of the example based machine translation technique for the development of system which would be able to translate an unrestricted French text into Polish is proposed. The new approach to the example-based machine translation technique that takes into account the peculiarity of the Polish grammar is developed. The obtained primary results of the development of the proposed system seem to be very promising and appear to be a step made in the right direction towards a fully-automatic high quality French-into-Polish machine translation system for unrestricted text.
Źródło:
TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2002, 6, 3; 523-544
1428-6394
Pojawia się w:
TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Application of linguistic cues in the analysis of language of hate groups
Autorzy:
Balcerzak, B.
Jaworski, W.
Powiązania:
https://bibliotekanauki.pl/articles/952938.pdf
Data publikacji:
2015
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
hate speech
natural language processing
propaganda
machine learning
Opis:
Hate speech and fringe ideologies are social phenomena that thrive on-line. Members of the political and religious fringe are able to propagate their ideas via the Internet with less effort than in traditional media. In this article, we attempt to use linguistic cues such as the occurrence of certain parts of speech in order to distinguish the language of fringe groups from strictly informative sources. The aim of this research is to provide a preliminary model for identifying deceptive materials online. Examples of these would include aggressive marketing and hate speech. For the sake of this paper, we aim to focus on the political aspect. Our research has shown that information about sentence length and the occurrence of adjectives and adverbs can provide information for the identification of differences between the language of fringe political groups and mainstream media.
Źródło:
Computer Science; 2015, 16 (2); 145-156
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Compressing sentiment analysis CNN models for efficient hardware processing
Autorzy:
Wróbel, Krzysztof
Karwatowski, Michał
Wielgosz, Maciej
Pietroń, Marcin
Wiatr, Kazimierz
Powiązania:
https://bibliotekanauki.pl/articles/305234.pdf
Data publikacji:
2020
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
natural language processing
convolutional neural networks
FPGA
compression
Opis:
Convolutional neural networks (CNNs) were created for image classification tasks. Shortly after their creation, they were applied to other domains, including natural language processing (NLP). Nowadays, solutions based on artificial intelligence appear on mobile devices and embedded systems, which places constraints on memory and power consumption, among others. Due to CNN memory and computing requirements, it is necessary to compress them in order to be mapped to the hardware. This paper presents the results of the compression of efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to an FPGA and the results of this implementation are described. The conducted simulations showed that the 5-bit width is enough to ensure no drop in accuracy when compared to the floating-point version of the network. Additionally, the memory footprint was significantly reduced (between 85 and 93% as compared to the original model).
Źródło:
Computer Science; 2020, 21 (1); 25-41
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Retrieval and interpretation of textual geolocalized information based on semantic geolocalized relations
Autorzy:
Korczyński, W.
Powiązania:
https://bibliotekanauki.pl/articles/305820.pdf
Data publikacji:
2015
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
geolocalization
geolocalized dictionary
geolocalized relations
natural language processing
Opis:
This paper describes a method for geolocalized information retrieval from natural language text and its interpretation by assigning it geographic coordinates. Proof-of-concept implementation is discussed, along with a geolocalized dictionary stored in a PostGIS/PostgreSQL spatial relational database. The discussed research focuses on the strongly inflectional Polish language; hence, additional complexity had to be taken into account. The presented method has been evaluated with the use of diverse metrics.
Źródło:
Computer Science; 2015, 16 (4); 395-414
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Knowledge graphs effectiveness in Neural Machine Translation improvement
Autorzy:
Ahmadnia, Benyamin
Dorr, Bonnie J.
Kordjamshidi, Parisa
Powiązania:
https://bibliotekanauki.pl/articles/1839251.pdf
Data publikacji:
2020
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
natural language processing
neural machine translation
knowledge graph representation
Opis:
Maintaining semantic relations between words during the translation process yields more accurate target-language output from Neural Machine Translation (NMT). Although difficult to achieve from training data alone, it is possible to leverage Knowledge Graphs (KGs) to retain source-language semantic relations in the corresponding target-language translation. The core idea is to use KG entity relations as embedding constraints to improve the mapping from source to target. This paper describes two embedding constraints, both of which employ Entity Linking (EL)—assigning a unique identity to entities—to associate words in training sentences with those in the KG: (1) a monolingual embedding constraint that supports an enhanced semantic representation of the source words through access to relations between entities in a KG; and (2) a bilingual embedding constraint that forces entity relations in the source-language to be carried over to the corresponding entities in the target-language translation. The method is evaluated for English-Spanish translation exploiting Freebase as a source of knowledge. Our experimental results demonstrate that exploiting KG information not only decreases the number of unknown words in the translation but also improves translation quality
Źródło:
Computer Science; 2020, 21 (3); 299-318
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Cluo: web-scale text mining system for open source intelligence purposes
Autorzy:
Maciołek, P.
Dobrowolski, G.
Powiązania:
https://bibliotekanauki.pl/articles/305361.pdf
Data publikacji:
2013
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
text mining
big data
OSINT
natural language processing
monitoring
Opis:
The amount of textual information published on the Internet is considered to be in billions of web pages, blog posts, comments, social media updates and others. Analyzing such quantities of data requires high level of distribution – both data and computing. This is especially true in case of complex algorithms, often used in text mining tasks. The paper presents a prototype implementation of CLUO – an Open Source Intelligence (OSINT) system, which extracts and analyzes significant quantities of openly available information.
Źródło:
Computer Science; 2013, 14 (1); 45-62
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
An English neural network that learns texts, finds hidden knowledge, and answers questions
Autorzy:
Ke, Y.
Hagiwara, M.
Powiązania:
https://bibliotekanauki.pl/articles/91771.pdf
Data publikacji:
2017
Wydawca:
Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:
natural language processing
neural network
question answering
natural language understanding
Opis:
In this paper, a novel neural network is proposed, which can automatically learn and recall contents from texts, and answer questions about the contents in either a large corpus or a short piece of text. The proposed neural network combines parse trees, semantic networks, and inference models. It contains layers corresponding to sentences, clauses, phrases, words and synonym sets. The neurons in the phrase-layer and the word-layer are labeled with their part-of-speeches and their semantic roles. The proposed neural network is automatically organized to represent the contents in a given text. Its carefully designed structure and algorithms make it able to take advantage of the labels and neurons of synonym sets to build the relationship between the sentences about similar things. The experiments show that the proposed neural network with the labels and the synonym sets has the better performance than the others that do not have the labels or the synonym sets while the other parts and the algorithms are the same. The proposed neural network also shows its ability to tolerate noise, to answer factoid questions, and to solve single-choice questions in an exercise book for non-native English learners in the experiments.
Źródło:
Journal of Artificial Intelligence and Soft Computing Research; 2017, 7, 4; 229-242
2083-2567
2449-6499
Pojawia się w:
Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A blackboard system for generating poetry
Autorzy:
Misztal-Radecka, J.
Indurkhya, B.
Powiązania:
https://bibliotekanauki.pl/articles/305325.pdf
Data publikacji:
2016
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
poetry generation
computational creativity
natural language processing
multi-agent system
Opis:
We present a system to generate poems based on the information extracted from input text such as blog posts. Our design uses the blackboard architecture, in which independent specialized modules cooperate during the generation process by sharing a common workspace known as the blackboard. Each module is responsible for a particular task while generating poetry. Our implementation incorporates modules that retrieve information from the input text, generate new ideas, or select the best partial solutions. These distinct modules (experts) are implemented as diverse computational units that make use of lexical resources, grammar models, sentiment-analyzing tools, and languageprocessing algorithms. A control module is responsible for scheduling actions on the blackboard. We argue that the blackboard architecture is a promising way of simulating creative processes because of its flexibility and compliance with the Global Workspace Theory of mind. The main contribution of this work is the design and prototype implementation of an extensible platform for a poetry-generating system that may be further extended by incorporating new experts as well as some existing poetrygenerating systems as parts of the blackboard architecture. We claim that this design provides a powerful tool for combining many of the existing efforts in the domain of automatic poetry generation.
Źródło:
Computer Science; 2016, 17 (2); 265-294
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Text summarizing in Polish
Streszczanie tekstu w języku polskim
Autorzy:
Branny, E.
Gajęcki, M.
Powiązania:
https://bibliotekanauki.pl/articles/305824.pdf
Data publikacji:
2005
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
przetwarzanie języka naturalnego
streszczanie tekstu
natural language processing
text summarizing
Opis:
The aim of this article is to describe an existing implementation of a text summarizer for Polish, to analyze the results and propose the possibilities of further development. The problem of text summarizing has been already addressed by science but until now there has been no implementation designed for Polish. The implemented algorithm is based on existing developments in the field but it also includes some improvements. It has been optimized for newspaper texts ranging from approx. 10 to 50 sentences. Evaluation has shown that it works better than known generic summarization tools when applied to Polish.
Celem artykułu jest zaprezentowanie algorytmu streszczającego teksty w języku polskim. Mimo istnienia algorytmów streszczających teksty, brak jest algorytmów dedykowanych dla języka polskiego. Przedstawiony algorytm bazuje na istniejących algorytmach streszczania tekstu, ale zawiera kilka ulepszeń. Algorytm jest przeznaczony dla streszczania tekstów prasowych liczących od 10 do 50 zdań. Przeprowadzone testy pokazują, że algorytm działa lepiej od znanych algorytmów zastosowanych dla języka polskiego.
Źródło:
Computer Science; 2005, 7; 31-48
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Swarm algorithms for NLP : the case of limited training data
Autorzy:
Tambouratzis, George
Vassiliou, Marina
Powiązania:
https://bibliotekanauki.pl/articles/1396739.pdf
Data publikacji:
2019
Wydawca:
Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:
particle swarm optimisation
natural language processing
text phrasing
machine translation
Opis:
The present article describes a novel phrasing model which can be used for segmenting sentences of unconstrained text into syntactically-defined phrases. This model is based on the notion of attraction and repulsion forces between adjacent words. Each of these forces is weighed appropriately by system parameters, the values of which are optimised via particle swarm optimisation. This approach is designed to be language-independent and is tested here for different languages. The phrasing model’s performance is assessed per se, by calculating the segmentation accuracy against a golden segmentation. Operational testing also involves integrating the model to a phrase-based Machine Translation (MT) system and measuring the translation quality when the phrasing model is used to segment input text into phrases. Experiments show that the performance of this approach is comparable to other leading segmentation methods and that it exceeds that of baseline systems.
Źródło:
Journal of Artificial Intelligence and Soft Computing Research; 2019, 9, 3; 219-234
2083-2567
2449-6499
Pojawia się w:
Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Evaluating lexicographer controlled semi-automatic word sense disambiguation method in a large scale experiment
Autorzy:
Broda, B.
Piasecki, M.
Powiązania:
https://bibliotekanauki.pl/articles/206405.pdf
Data publikacji:
2011
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
natural language processing
word sense disambiguation
semi-supervised machine learning
Opis:
Word Sense Disambiguation in text remains a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods yield significantly lower precision and produce results that are not satisfying for many applications. Recently, an algorithm based on weakly-supervised learning for WSD called Lexicographer-Controlled Semi-automatic Sense Disambiguation (LexCSD) was proposed. The method is based on clustering of text snippets including words in focus. For each cluster we find a core, which is labelled with a word sense by a human, and is used to produce a classifier. Classifiers, constructed for each word separately, are applied to text. The goal of this work is to evaluate LexCSD trained on large volume of untagged text. A comparison showed that the approach is better than most frequent sense baseline in most cases.
Źródło:
Control and Cybernetics; 2011, 40, 2; 419-436
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Terminologiedatenbanken im mobilen Einsatz – eine Projektskizze
Autorzy:
Rösener, Christoph
Powiązania:
https://bibliotekanauki.pl/articles/700383.pdf
Data publikacji:
2013
Wydawca:
Stowarzyszenie Germanistów Polskich
Tematy:
terminology databases, research project, natural language processing, linguistic intelligence, special languages
Opis:
Initially this paper describes the newest trends and tendencies of mobile usage of terminology databases. Additionally it presents the latest technical developments in this area. This is then followed by an overview about a research project, which investigates the concept, implementation and usage of a central terminology database application for mobile usage within a public sector institution in special operational scenarios.
Źródło:
Zeitschrift des Verbandes Polnischer Germanisten; 2013, 2, 2
2353-656X
2353-4893
Pojawia się w:
Zeitschrift des Verbandes Polnischer Germanisten
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Building semantic user profile for polish web news portal
Autorzy:
Misztal-Radecka, J.
Powiązania:
https://bibliotekanauki.pl/articles/305619.pdf
Data publikacji:
2018
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
user profiling
word embeddings
topic modeling
natural language processing
gender prediction
Opis:
The aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task –Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.
Źródło:
Computer Science; 2018, 19 (3); 307--332
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Domain specific key feature extraction using knowledge graph mining
Autorzy:
Barai, Mohit Kumar
Sanyal, Subhasis
Powiązania:
https://bibliotekanauki.pl/articles/2027771.pdf
Data publikacji:
2020
Wydawca:
Uniwersytet Ekonomiczny w Katowicach
Tematy:
Feature extraction
Knowledge graph
Natural language processing
Product review
Text processing
Opis:
In the field of text mining, many novel feature extraction approaches have been propounded. The following research paper is based on a novel feature extraction algorithm. In this paper, to formulate this approach, a weighted graph mining has been used to ensure the effectiveness of the feature extraction and computational efficiency; only the most effective graphs representing the maximum number of triangles based on a predefined relational criterion have been considered. The proposed novel technique is an amalgamation of the relation between words surrounding an aspect of the product and the lexicon-based connection among those words, which creates a relational triangle. A maximum number of a triangle covering an element has been accounted as a prime feature. The proposed algorithm performs more than three times better than TF-IDF within a limited set of data in analysis based on domain-specific data.
Źródło:
Multiple Criteria Decision Making; 2020, 15; 1-22
2084-1531
Pojawia się w:
Multiple Criteria Decision Making
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The Impact of Investor Sentiment on Direction of Stock Price Changes: Evidence from the Polish Stock Market
Autorzy:
Polak, Kamil
Powiązania:
https://bibliotekanauki.pl/articles/2053925.pdf
Data publikacji:
2021-12-20
Wydawca:
Uniwersytet Warszawski. Wydawnictwo Naukowe Wydziału Zarządzania
Tematy:
sentiment analysis
natural language processing
machine learning
financial forecasting
behavioral finance
Opis:
The purpose of this research is to examine the impact of sentiment derived from news headlines on the direction of stock price changes. The study examines stocks listed on the WIG-banking sub-sector index on the Warsaw Stock Exchange. Two types of data were used: textual and market data. The research period covers the years 2015–2018. Through the research, 7,074 observations were investigated, of which 3,390 with positive sentiment, 2,665 neutral, and 1,019 negative. In order to examine the predictive power of sentiment, six machine learning models were used: Decision Tree Classifier, Random Forest Classifier, XGBoost Classifier, KNN Classifier, SVC and Gaussian Naive Bayes Classifier. Empirical results show that the sentiment of news headlines has no significant explanatory power for the direction of stock price changes in one-day time frame.
Źródło:
Journal of Banking and Financial Economics; 2021, 2(16); 72-90
2353-6845
Pojawia się w:
Journal of Banking and Financial Economics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Towards textual data augmentation for neural networks: synonyms and maximum loss
Autorzy:
Jungiewicz, Michał
Smywiński-Pohl, Aleksander
Powiązania:
https://bibliotekanauki.pl/articles/305750.pdf
Data publikacji:
2019
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
deep learning
data augmentation
neural networks
natural language processing
sentence classification
Opis:
Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algorithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text; this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augmentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.
Źródło:
Computer Science; 2019, 20 (1); 57-83
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Methodological and technical challenges of a corpus-based study of Naija
Autorzy:
Caron, Bernard
Linde-Usiekniewicz, Jadwiga
Storch, Anne
Powiązania:
https://bibliotekanauki.pl/chapters/1036950.pdf
Data publikacji:
2020
Wydawca:
Uniwersytet Warszawski. Wydawnictwa Uniwersytetu Warszawskiego
Tematy:
natural language processing
corpus studies
syntax
prosody
Atlantic pidgins and creoles
Opis:
This paper presents early reflections on the NaijaSynCor survey (NSC) financed by the French Agence Nationale de la Recherche. The nature of the language surveyed (Naija, a post-creole spoken in Nigeria as a second language by close to 100 million speakers) has induced a specific choice of theoretical framework (variationist sociolinguistics) and methodology (a corpus-based study using Natural Language Processing). Half-way through the 4 year-study, the initial methodological choices are assessed taking into account the nature of the data that has been collected, and the problems that occurred as early as the initial stages of their annotation.
Źródło:
West African languages. Linguistic theory and communication; 57-75
9788323546313
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish
Autorzy:
Tulu, Cagatay Neftali
Powiązania:
https://bibliotekanauki.pl/articles/2201815.pdf
Data publikacji:
2022
Wydawca:
Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:
semantic word similarity
word embeddings
NLP
Turkish NLP
natural language processing
Opis:
This study aims to evaluate experimentally the word vectors produced by three widely used embedding methods for the word-level semantic text similarity in Turkish. Three benchmark datasets SimTurk, AnlamVer, and RG65_Turkce are used in this study to evaluate the word embedding vectors produced by three different methods namely Word2Vec, Glove, and FastText. As a result of the comparative analysis, Turkish word vectors produced with Glove and FastText gained better correlation in the word level semantic similarity. It is also found that The Turkish word coverage of FastText is ahead of the other two methods because the limited number of Out of Vocabulary (OOV) words have been observed in the experiments conducted for FastText. Another observation is that FastText and Glove vectors showed great success in terms of Spearman correlation value in the SimTurk and AnlamVer datasets both of which are purely prepared and evaluated by local Turkish individuals. This is another indicator showing that these aforementioned datasets are better representing the Turkish language in terms of morphology and inflections.
Źródło:
Advances in Science and Technology. Research Journal; 2022, 16, 4; 147--156
2299-8624
Pojawia się w:
Advances in Science and Technology. Research Journal
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Computational Analysis of Printed Arabic Text Database for Natural Language Processing
Analiza obliczeniowa bazy danych tekstów drukowanych w języku arabskim na potrzeby przetwarzania języka naturalnego
Autorzy:
Bouressace, Hassina
Powiązania:
https://bibliotekanauki.pl/articles/49331207.pdf
Data publikacji:
2023
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
język arabski
słownictwo
dokumenty w języku arabskim
słownik frekwencyjny
baza danych tekstów drukowanych w języku arabskim
Arabic language
vocabulary
Arabic documents
frequency dictionary
Arabic printed text database
Opis:
A frequency dictionary of printed Arabic text is essential for natural language processing. It includes 1,251 XML files of Arabic documents collected from ten newspapers and magazines from different countries and created as the PATD database. A total of 2,344 articles were created with various structures: open vocabulary, multi-font, multi-size, and multi-style text. From these articles, 1,102,078 tokens, 19,926 sentences, and 1,000,000 words were extracted. This dictionary provides detailed information for each word, including English equivalents, usage statistics, usage distribution, and the most widely used terms. A thematic vocabulary list of the top words on various topics is also provided. This frequency dictionary is a useful resource of modern Arabic vocabulary for various specialists, students, and learners. The frequency dictionary is freely available to interested researchers on the webpage.
Słownik frekwencyjny bazy danych tekstów drukowanych w języku arabskim jest niezbędny do przetwarzania języka naturalnego. Baza danych tekstów drukowanych w języku arabskim (PATD) zawiera 1251 plików XML różnych dokumentów w języku arabskim pochodzących z dziesięciu gazet i czasopism z kilku krajów. Łącznie utworzono 2 344 artykuły o różnych strukturach: teksty z otwartym słownictwem, z wieloma czcionkami o różnej wielkości  i reprezentujące różne style. Z tych artykułów wyodrębniono 1 102 078 tokenów, 19 926 zdań i 1 000 000 leksemów. Słownik frekwencyjny jest przydatnym źródłem współczesnego słownictwa arabskiego dla różnych specjalistów, studentów oraz uczniów. Jest udostępniony bezpłatnie dla zainteresowanych badaczy na stronie internetowej.
Źródło:
Cognitive Studies | Études cognitives; 2023, 23
1641-9758
2392-2397
Pojawia się w:
Cognitive Studies | Études cognitives
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Dependability aspects of language technology infrastructure
Autorzy:
Walkowiak, T.
Pol, M.
Powiązania:
https://bibliotekanauki.pl/articles/2068758.pdf
Data publikacji:
2018
Wydawca:
Uniwersytet Morski w Gdyni. Polskie Towarzystwo Bezpieczeństwa i Niezawodności
Tematy:
dependability
language technology infrastructure
natural language processing
micro-service architecture
CLARIN-PL
Opis:
The paper presents dependability analysis of CLARIN-PL Centre of Language Technology (CLT). It describes infrastructure, high availability aspects and micro-service architecture used in CLARIN-PL applications. Microservices architecture improves dependability in respect to availability and reliability and to some extent safety. It is comprised of the mechanisms of reliable communication of applications, replication, recovery, and transaction processing. CLT has also a set of components for failure detection, monitoring and autonomic management, and distributed security policy enforcement.
Źródło:
Journal of Polish Safety and Reliability Association; 2018, 9, 3; 101--108
2084-5316
Pojawia się w:
Journal of Polish Safety and Reliability Association
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies