- Tytuł:
- Statistical proper name recognition in Polish economic texts
- Autorzy:
-
Marcińczuk, M.
Piasecki, M. - Powiązania:
- https://bibliotekanauki.pl/articles/206385.pdf
- Data publikacji:
- 2011
- Wydawca:
- Polska Akademia Nauk. Instytut Badań Systemowych PAN
- Tematy:
-
proper name recognition
named entity recognition
machine learning
hidden Markov model
rule-base approach
dictionary-base approach - Opis:
- In the paper we present a Proper Name Recognition algorithm based on the Hidden Markov Model (HMM). Recognition of the Proper Names (PN) is treated as the basis for Named Entity Recognition problem in general. The proposed method is based on combining domain-dependent method based on HMM with domain independent methods based on gazetteers and hand-written rules for recognition and post-processing that capture the general properties of Polish PN structure. A large gazetteer with entries described morphologically was acquired from the web. The HMM re-scoring mechanism was applied as a basis for integration of different knowledge sources in PN recognition. Results of experiments on a domain corpus of Polish stock exchange reports, used for training and testing, are presented. A cross-domain evaluation on two other corpora is also presented. Adaptability of the method was analysed by applying the trained model to two other domain corpora.
- Źródło:
-
Control and Cybernetics; 2011, 40, 2; 393-418
0324-8569 - Pojawia się w:
- Control and Cybernetics
- Dostawca treści:
- Biblioteka Nauki