Wszystkie pola: M-word - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Evaluating lexicographer controlled semi-automatic word sense disambiguation method in a large scale experiment
Autorzy:: Broda, B.
Piasecki, M.
Powiązania:: https://bibliotekanauki.pl/articles/206405.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: natural language processing
word sense disambiguation
semi-supervised machine learning
Opis:: Word Sense Disambiguation in text remains a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods yield significantly lower precision and produce results that are not satisfying for many applications. Recently, an algorithm based on weakly-supervised learning for WSD called Lexicographer-Controlled Semi-automatic Sense Disambiguation (LexCSD) was proposed. The method is based on clustering of text snippets including words in focus. For each cluster we find a core, which is labelled with a word sense by a human, and is used to produce a classifier. Classifiers, constructed for each word separately, are applied to text. The goal of this work is to evaluate LexCSD trained on large volume of untagged text. A comparison showed that the approach is better than most frequent sense baseline in most cases.
Źródło:: Control and Cybernetics; 2011, 40, 2; 419-436
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Skocz do pozycji: 2.

Tytuł:: Extraction of Polish noun senses from large corpora by means of clustering
Autorzy:: Broda, B.
Piasecki, M.
Szpakowicz, S.
Powiązania:: https://bibliotekanauki.pl/articles/969804.pdf
Data publikacji:: 2010
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: corpus linguistics
semantic similarity
Polish nouns
word clustering
Clustering by Committee
co-occurrence retrieval models
rank weight function
Polish WordNet
WordNet-based synonymy test
document clustering
keywords extraction
Opis:: We investigate two methods of identifying noun senses, based on clustering of lemmas and of documents. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on very large Polish corpora. The evaluation by means of a WordNet-based synonymy test used Polish wordnet (plWordNet 1.0). Various clustering algorithms were analysed for the needs of extraction of document clusters as indicators of the senses of words which occur in them. The two approaches to wordsense identification have been compared, and conclusions drawn.
Źródło:: Control and Cybernetics; 2010, 39, 2; 401-420
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Informacja