Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "corpus annotation" wg kryterium: Temat


Wyświetlanie 1-15 z 15
Tytuł:
Bulgarian sense-annotated corpus – between the tradition and novelty
Autorzy:
Koeva, Svetla
Powiązania:
https://bibliotekanauki.pl/articles/677294.pdf
Data publikacji:
2012
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
corpus studies
corpus annotation
annotation principles
Opis:
Bulgarian sense-annotated corpus – between the tradition and noveltyThe Bulgarian Sense-annotated Corpus (BulSemCor) is compiled according to the general methodology established by the SemCor project. It is a subset of the Brown Corpus of Bulgarian semantically annotated with a corresponding synonym set (synset) in the Bulgarian wordnet. Unlike the bulk of sense-annotated corpora where only (sets of) content words are annotated, in BulSemCor each lexical unit has been assigned a sense. The main contributions achieved in the work on BulSemCor are briefly decides in the presented paper: definition of an annotation schema, compilation of an input corpus, development of a sense-annotated corpus, Bulgarian wordnet enlargement.
Źródło:
Cognitive Studies; 2012, 12
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Towards an event annotated corpus of Polish
Autorzy:
Marcińczuk, Michał
Oleksy, Marcin
Bernaś, Tomasz
Kocoń, Jan
Wolski, Michał
Powiązania:
https://bibliotekanauki.pl/articles/677125.pdf
Data publikacji:
2015
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
information extraction
event recognition
corpus annotation
Opis:
Towards an event annotated corpus of PolishThe paper presents a typology of events built on the basis of TimeML specification adapted to Polish language. Some changes were introduced to the definition of the event categories and a motivation for event categorization was formulated. The event annotation task is presented on two levels – ontology level (language independent) and text mentions (language dependant). The various types of event mentions in Polish text are discussed. A procedure for annotation of event mentions in Polish texts is presented and evaluated. In the evaluation a randomly selected set of documents from the Corpus of Wrocław University of Technology (called KPWr) was annotated by two linguists and the annotator agreement was calculated. The evaluation was done in two iterations. After the first evaluation we revised and improved the annotation procedure. The second evaluation showed a significant improvement of the agreement between annotators. The current work was focused on annotation and categorisation of event mentions in text. The future work will be focused on description of event with a set of attributes, arguments and relations.
Źródło:
Cognitive Studies; 2015, 15
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Enhancing grammar and valence resources for Akan and Ga
Autorzy:
Beermann, Dorothee
Hellan, Lars
Linde-Usiekniewicz, Jadwiga
Storch, Anne
Powiązania:
https://bibliotekanauki.pl/chapters/1040110.pdf
Data publikacji:
2020
Wydawca:
Uniwersytet Warszawski. Wydawnictwa Uniwersytetu Warszawskiego
Tematy:
digital resources
lexicon
valence
corpus annotation
Akan
Ga
Opis:
We present a case study in valence comparison between closely related Kwa languages, assessing frames and meanings of the verb ba (‘come’) in Akan with a homophonous corresponding item in Ga. The discussion draws on the Akan dictionary (Christaller 1881), a Ga valence dictionary based on (Dakubu 2009), and an online annotated corpus of Akan hosted in TypeCraft (Beermann & Mihaylov 2014). With a view to the possibility of making use of resources for one language in the development of resources for another, we demonstrate how digital resources and linguistic specifications can inform each other.
Źródło:
West African languages. Linguistic theory and communication; 166-185
9788323546313
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Erotetic Reasoning Corpus. A data set for research on natural question processing
Autorzy:
Łupkowski, P.
Urbański, M.
Wiśniewski, A.
Błądek, W.
Juska, A.
Kostrzewa, A.
Pankow, D.
Paluszkiewicz, K.
Ignaszak, O.
Urbańska, J.
Żyluk, N.
Gajda, A.
Marciniak, B.
Powiązania:
https://bibliotekanauki.pl/articles/103809.pdf
Data publikacji:
2017
Wydawca:
Polska Akademia Nauk. Instytut Podstaw Informatyki PAN
Tematy:
question
logic of question
question processing
erotetic reasoning
corpus annotation
Opis:
The aim of this paper is to present the Erotetic Reasoning Corpus (ERC) which constitutes a data set for research on natural question processing. We describe the theoretical background, linguistic data and tags used for the annotation process. We also discuss the potential areas in which the ERC can be exploited.
Źródło:
Journal of Language Modelling; 2017, 5, 3; 607-631
2299-856X
2299-8470
Pojawia się w:
Journal of Language Modelling
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Language resources for named entity annotation in the National Corpus of Polish
Autorzy:
Savary, A.
Piskorski, J.
Powiązania:
https://bibliotekanauki.pl/articles/206388.pdf
Data publikacji:
2011
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
natural language processing
proper names
named entities
corpus annotation
Polish National Corpus
SProUT
Opis:
We present the named entity annotation subtask of a project aiming at creating the National Corpus of Polish. We summarize the annotation requirements defined for this corpus, and we discuss how existing lexical resources and grammars for named entity recognition for Polish have been adapted to meet those requirements. We show detailed results of the corpus annotation using the information extraction platform SProUT. We also analyze the errors committed by our knowledge-based method and suggest its further improvements.
Źródło:
Control and Cybernetics; 2011, 40, 2; 361-391
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Multilingual digital resources with Bulgarian language
Autorzy:
Dimitrova, Ludmila
Powiązania:
https://bibliotekanauki.pl/articles/677179.pdf
Data publikacji:
2010
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
corpora (parallel
comparable
aligned)
corpus annotation
digital dictionaries
lexical databases
morpho-syntactic specifications
Opis:
Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language resources as a part of multilingual digital resources developed in the frame of some international projects, among them parallel annotated and aligned corpora, comparable corpora, morpho-syntactic specifications for corpora annotation and dictionaries encoding, lexicons, lexical databases, and electronic dictionaries.
Źródło:
Cognitive Studies; 2010, 10
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)
Autorzy:
Dimitrova, Ludmila
Koseska-Toszewa, Violetta
Roszko, Danuta
Roszko, Roman
Powiązania:
https://bibliotekanauki.pl/articles/677184.pdf
Data publikacji:
2010
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
multilingual electronic corpora
parallel and comparable corpora
corpus annotation
lexical databases
multilingual electronic dictionaries
Opis:
Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)In this paper we present applications of a trilingual corpus in language research. Comparative and contrastive studies of Polish and Bulgarian as well as Polish and Lithuanian have been already conducted, but up to the best of our knowledge no such studies exist for Bulgarian and Lithuanian. On the one hand, it is interesting to note that two Slavic languages are compared to a Baltic language (Lithuanian). On the other hand, the three languages are marginally present in the EU because of the later ascension of the three countries to the EU. The paper shortly describes the first electronic Bulgarian–Polish–Lithuanian experimental corpus, currently under development only for research. We also focus our attention on the morphosyntactic annotation of the parallel trilingual corpus according to the Corpus Encoding Standard: we present a review of the Part-of-Speech (POS) classification of the participle in the three languages – Bulgarian, Polish, and Lithuanian in comparison to another POS, the adjective. We briefly discuss tagsets for corpus annotation from the point of view of possible unification in the future with some examples.
Źródło:
Cognitive Studies; 2010, 10
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Zastosowanie gier skierowanych na cel do anotacji korpusów językowych
The applications of games with a purpose used for obtaining annotated language resources
Autorzy:
Włodarczyk, Wojciech
Powiązania:
https://bibliotekanauki.pl/articles/460019.pdf
Data publikacji:
2015
Wydawca:
Fundacja Pro Scientia Publica
Tematy:
gry skierowane na cel
GWAP
crowdsourcing
human computation
przetwarzanie języka naturalnego
sztuczna inteligencja, AI-zupełne
anotacja korpusu
Wordrobe
game with a purpose
natural language processing
artificial intelligence, AI-complete
corpus annotation
Opis:
Istnienie problemów AI-zupełnych przyczyniło się do poszukiwań alternatywnych sposobów rozwiązywania problemów sztucznej inteligencji, nie opartych wyłącznie na pracy komputera. Pomimo że komunikacja jest dla ludzi czymś oczywistym, nadal nie istnieje sposób jej automatyzacji. Aktualnie powszechnie stosowanym podejściem w rozwiązywaniu problemów NLP jest podejście statystyczne, którego powodzenie zależy od wielkości korpusu językowego. Przygotowanie rzetelnego zbioru danych jest zatem kluczowym aspektem tworzenia statystycznego systemu sztucznej inteligencji. Z uwagi na zaangażowanie specjalistów jest to proces czasochłonny i kosztowny. Jednym z obiecujących podejść, pomagających zredukować czas i koszt tworzenia otagowanego korpusu, jest korzystanie z gier skierowanych na cel. Ambicją niniejszej pracy jest przybliżenie poszczególnych etapów tworzenia gry przeznaczonej do pozyskania zasobów językowych oraz omówienie skuteczności jej działania. Analiza ta zostanie przeprowadzona na podstawie kolekcji gier Wordrobe wspierających anotacje korpusu języka naturalnego.
The existence of AI-complete problems has led to a growth in research of alternative ways of solving artificial intelligence problems, which are not based solely on the computer. Although for us communication is obvious, there is still no way automate it. The current widely-used approach to solving the problems of NLP is a statistical one, whose success depends on the size of the training corpus. The preparation of a reliable set of data is therefore a key aspect in creating an artificial intelligence statistical system. Due to the involvement of a large number of specialists this is a very time-consuming and expensive process. One promising approache in helping reduce the time and cost of creating a tagged corpus is the use of games with a purpose. The objective of this paper is to present the stages of creating games with a purpose used for obtaining annotated language resources and to discuss its effectiveness. This analysis will be done based on the Wordrobe project, a collection of games created to support the gathering of an annotated corpus of natural language.
Źródło:
Ogrody Nauk i Sztuk; 2015, 5; 112-220
2084-1426
Pojawia się w:
Ogrody Nauk i Sztuk
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Multi-level annotation of the specialized Corpus of Dialogs of Disabled Polish Speakers
Autorzy:
Trzebińska, Joanna
Bartoszewicz, Jakub
Powiązania:
https://bibliotekanauki.pl/articles/677159.pdf
Data publikacji:
2014
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
speech corpus
pragmatic annotation
semantic annotation
disability
Opis:
Multi-level annotation of the specialized Corpus of Dialogs of Disabled Polish SpeakersWhile Polish language is relatively well represented in general purpose corpora such as National Polish Language Corpus still there are groups of speakers that are underrepresented in reference corpora. One of such sub-groups is the disabled people community. On the other hand there is a growing need for understanding how disability influences social and cognitive abilities, language in particular. In this paper, we present a specialized Corpus of Dialogs of Disabled Speakers. The process of compiling, transcription and annotation of pragmatic, semantic and morphosyntactic features will be described, as well as Corpus applications will be discussed.
Źródło:
Cognitive Studies; 2014, 14
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A French corpus annotated for multiword expressions and named entities
Autorzy:
Candito, Marie
Constant, Mathieu
Ramisch, Carlos
Savary, Agata
Guillaume, Bruno
Parmentier, Yannick
Cordeiro, Silvio Ricardo
Powiązania:
https://bibliotekanauki.pl/articles/1818889.pdf
Data publikacji:
2020
Wydawca:
Polska Akademia Nauk. Instytut Podstaw Informatyki PAN
Tematy:
multiword expressions
annotation
corpus
French
Opis:
We present the enrichment of a French treebank of various genres with a new annotation layer for multiword expressions (MWEs) and named entities (NEs).1 Our contribution with respect to previous work on NE and MWE annotation is the particular care taken to use formal criteria, organized into decision flowcharts, shedding some light on the interactions between NEs and MWEs. Moreover, in order to cope with the well-known difficulty to draw a clear-cut frontier between compositional expressions and MWEs, we chose to use sufficient criteria only. As a result, annotated MWEs satisfy a varying number of sufficient criteria, accounting for the scalar nature of the MWE status. In addition to the span of the elements, annotation includes the subcategory of NEs (e.g., person, location) and one matching sufficient criterion for non-verbal MWEs (e.g., lexical substitution). The 3,099 sentences of the treebank were double-annotated and adjudicated, and we paid attention to cross-type consistency and compatibility with the syntactic layer. Overall inter-annotator agreement on non-verbal MWEs and NEs reached 71.1%. The released corpus contains 3,112 annotated NEs and 3,440 MWEs, and is distributed under an open license.
Źródło:
Journal of Language Modelling; 2020, 8, 2; 415--479
2299-856X
2299-8470
Pojawia się w:
Journal of Language Modelling
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A French corpus annotated for multiword expressions and named entities
Autorzy:
Candito, Marie
Constant, Mathieu
Ramisch, Carlos
Savary, Agata
Guillaume, Bruno
Parmentier, Yannick
Cordeiro, Silvio Ricardo
Powiązania:
https://bibliotekanauki.pl/articles/1818891.pdf
Data publikacji:
2020
Wydawca:
Polska Akademia Nauk. Instytut Podstaw Informatyki PAN
Tematy:
multiword expressions
annotation
corpus
French
Opis:
We present the enrichment of a French treebank of various genres with a new annotation layer for multiword expressions (MWEs) and named entities (NEs).1 Our contribution with respect to previous work on NE and MWE annotation is the particular care taken to use formal criteria, organized into decision flowcharts, shedding some light on the interactions between NEs and MWEs. Moreover, in order to cope with the well-known difficulty to draw a clear-cut frontier between compositional expressions and MWEs, we chose to use sufficient criteria only. As a result, annotated MWEs satisfy a varying number of sufficient criteria, accounting for the scalar nature of the MWE status. In addition to the span of the elements, annotation includes the subcategory of NEs (e.g., person, location) and one matching sufficient criterion for non-verbal MWEs (e.g., lexical substitution). The 3,099 sentences of the treebank were double-annotated and adjudicated, and we paid attention to cross-type consistency and compatibility with the syntactic layer. Overall inter-annotator agreement on non-verbal MWEs and NEs reached 71.1%. The released corpus contains 3,112 annotated NEs and 3,440 MWEs, and is distributed under an open license.
Źródło:
Journal of Language Modelling; 2020, 8, 2; 415--479
2299-856X
2299-8470
Pojawia się w:
Journal of Language Modelling
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Construction of a medical corpus based on information extraction results
Autorzy:
Marciniak, M.
Mykowiecka, A.
Powiązania:
https://bibliotekanauki.pl/articles/206379.pdf
Data publikacji:
2011
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
corpus
semantic annotation
clinical data
information extraction
Opis:
The paper presents a method of automatic construction of a semantically annotated corpus using the results of a rulebased information extraction (IE) application. Construction of the corpus is based on using existing programs for text tokenization and morphological analysis and combining their results with domain related correction rules. We reuse the specialized IE system to obtain a corpus annotated on the semantic level. The texts included within the corpus are Polish free text clinical data. We present the documents - diabetic patients' discharge records, the structure of the corpus annotation and the methods for obtaining the annotations. Initial evaluations based on the results of manual verification of selected data subset are also presented. The corpus, once manually corrected, is designed to be used for developing supervised machine learning models for IE applications.
Źródło:
Control and Cybernetics; 2011, 40, 2; 337-360
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Web-Application for the Presentation of Bilingual Corpora (Focusing on Bulgarian as One of the Two Paired Languages)
Autorzy:
Dimitrova, Ludmila
Dutsova, Ralitsa
Powiązania:
https://bibliotekanauki.pl/articles/677223.pdf
Data publikacji:
2013
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
parallel corpus
aligned corpus
concordance
linguistic annotation
lemmatization
POS-tagging
web-interface
web-application
Opis:
Web-Application for the Presentation of Bilingual Corpora (Focusing on Bulgarian as One of the Two Paired Languages)This paper briefly presents a web-application for the presentation of bilingual aligned corpora focusing on Bulgarian as one the two paired languages. The focus is given to the description of the software tools and user interface. The software is developed in IMI-BAS and will be hosted on a server there. Some examples of the usage of the web-application for the presentation of a Bulgarian-Polish aligned corpus are included.
Źródło:
Cognitive Studies; 2013, 13
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Experimental Corpus of the Lithuanian Local Dialect of Punsk in Poland. Examples of the Lexical and Semantic Annotation
Autorzy:
Roszko, Danuta
Powiązania:
https://bibliotekanauki.pl/articles/677261.pdf
Data publikacji:
2013
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
corpora
annotation
Lithuanian local dialect of Punsk in Poland
experimental dialectal corpus
Opis:
Experimental Corpus of the Lithuanian Local Dialect of Punsk in Poland. Examples of the Lexical and Semantic AnnotationIn the article the author describes the experimental corpus of the Lithuanian local dialect of Puńsk in Poland (ECorp-of-Punsk). It is the first corpus of this type for the Lithuanian local dialect. The corpus consists of three subcorpora. The first one (referred to as fundamental) contains utterances given by Lithuanians in the local dialect, the second one – utterances given by Lithuanians in Polish, the third one – aligned Polish-dialectal texts.  The texts recorded in the years 1986–2012 have been included in the Ecorp-of-Punsk resources.
Źródło:
Cognitive Studies; 2013, 13
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Trilingual aligned corpus – current state and new applications
Autorzy:
Dimitrova, Ludmila
Koseska, Violetta
Roszko, Danuta
Roszko, Roman
Powiązania:
https://bibliotekanauki.pl/articles/967220.pdf
Data publikacji:
2014
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
aligned trilingual corpus
digital resources
event
Petri net theory
semantic annotation
state
Opis:
Trilingual aligned corpus – current state and new applicationsThis article describes current state of a trilingual parallel corpus consisted of texts in two Slavic (Bulgarian and Polish) and one Baltic language (Lithuanian). The corpus contains original literary texts (fiction, novels, and short stories) in one of the three languages with translations to the other two, and texts in other languages translated into Bulgarian, Polish, and Lithuanian. A part of the texts are aligned at the sentence level. The authors propose a semantic annotation of verbs appearing in these aligned texts that will facilitate contrastive studies of natural languages. A theoretical background for the proposed semantic annotation is briefly also discussed.
Źródło:
Cognitive Studies; 2014, 14
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
    Wyświetlanie 1-15 z 15

    Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies