- Tytuł:
- Linguistic Complexity: English vs. Polish, Text vs. Corpus
- Autorzy:
-
Kwapień, J.
Drożdż, S.
Orczyk, A. - Powiązania:
- https://bibliotekanauki.pl/articles/1538580.pdf
- Data publikacji:
- 2010-04
- Wydawca:
- Polska Akademia Nauk. Instytut Fizyki PAN
- Tematy:
-
89.75.Da
89.75.Fb - Opis:
- We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant.
- Źródło:
-
Acta Physica Polonica A; 2010, 117, 4; 716-720
0587-4246
1898-794X - Pojawia się w:
- Acta Physica Polonica A
- Dostawca treści:
- Biblioteka Nauki