- Tytuł:
- The IMPACT project Polish Ground-Truth texts as a Djvu corpus
- Autorzy:
- Bień, Janusz S.
- Powiązania:
- https://bibliotekanauki.pl/articles/677177.pdf
- Data publikacji:
- 2014
- Wydawca:
- Polska Akademia Nauk. Instytut Slawistyki PAN
- Tematy:
-
Polish language
corpora
DjVu
OCR
PAGE
Page Analysis and Ground-Truth Elements
GNU GPL - Opis:
- The IMPACT project Polish Ground-Truth texts as a Djvu corpusThe purpose of the paper is twofold. First, to describe the already implemented idea of DjVu corpora, i.e. corpora which consist of both scanned images and a transcription of the texts with the words associated with their occurrences in the scans. Secondly, to present a case study of a corpus consisting of almost 5 000 pages of Polish historical texts dating from 1570 to 1756 (it is practically the very first corpus of historical Polish). The tools described have universal character and are freely available under the GNU GPL license, hence they can be used also for other purposes.
- Źródło:
-
Cognitive Studies; 2014, 14
2392-2397 - Pojawia się w:
- Cognitive Studies
- Dostawca treści:
- Biblioteka Nauki