- Tytuł:
- How to embed noncrossing trees in Universal Dependencies treebanks in a low-complexity regular language
- Autorzy:
- Yli-Jyrä, Anssi Mikael
- Powiązania:
- https://bibliotekanauki.pl/articles/103803.pdf
- Data publikacji:
- 2019
- Wydawca:
- Polska Akademia Nauk. Instytut Podstaw Informatyki PAN
- Tematy:
-
bounded stack
coding morphisms
context-free grammars
dependencies
finite-state
parsing
state complexity
treebanks - Opis:
- A recently proposed balanced-bracket encoding (Yli-Jyrä and Gómez-Rodríguez 2017) has given us a way to embed all noncrossing dependency graphs into the string space and to formulate their exact arcfactored inference problem (Kuhlmann and Johnsson 2015) as the best string problem in a dynamically constructed and weighted unambiguous context-free grammar. The current work improves the encoding and makes it shallower by omitting redundant brackets from it. The streamlined encoding gives rise to a bounded-depth subset approximation that is represented by a small finite-state automaton. When bounded to 7 levels of balanced brackets, the automaton has 762 states and represents a strict superset of more than 99.9999% of the noncrossing trees available in Universal Dependencies 2.4 (Nivre et al. 2019). In addition, it strictly contains all 15-vertex noncrossing digraphs. When bounded to 4 levels and 90 states, the automaton still captures 99.2% of all noncrossing trees in the reference dataset. The approach is flexible and extensible towards unrestricted graphs, and it suggests tight finite-state bounds for dependency parsing, and for the main existing parsing methods.
- Źródło:
-
Journal of Language Modelling; 2019, 7, 2; 177-232
2299-856X
2299-8470 - Pojawia się w:
- Journal of Language Modelling
- Dostawca treści:
- Biblioteka Nauki