Temat: historical corpus - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. – problemy teoretyczne i warsztatowe
Autorzy:: Gruszczyński, Włodzimierz
Adamiec, Dorota
Bronikowska, Renata
Wieczorek, Aleksandra
Powiązania:: https://bibliotekanauki.pl/articles/1630441.pdf
Data publikacji:: 2020
Wydawca:: Towarzystwo Kultury Języka
Tematy:: electronic text corpus
historical corpus
17th-18th-century Polish
natural language processing
Opis:: This paper presents the Electronic Corpus of 17th- and 18th-century Polish Texts (KorBa) – a large (13.5-million), annotated historical corpus available online. Its creation was modelled on the assumptions of the National Corpus of Polish (NKJP), yet the specifi c nature of the historical material enforced certain modifi cations of the solutions applied in NKJP, e.g. two forms of text representation (transliteration and transcription) were introduced, the principle of designating foreign-language fragments was adopted, and the tagset was adapted to the description of the grammatical structure of the Middle Polish language. The texts collected in KorBa are diversified in chronological, geographical, stylistic, and thematic terms although, due to e.g. limited access to the material, the postulate of representativeness and sustainability of the corpus was not fully implemented. The work on the corpus was to a large extent automated as a result of using natural language processing tools.
Źródło:: Poradnik Językowy; 2020, 777, 8; 32-51
0551-5343
Pojawia się w:: Poradnik Językowy
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Electronic Diachronic Corpus and Dictionaries of Old Bulgarian
Autorzy:: Ganeva, Gergana
Powiązania:: https://bibliotekanauki.pl/articles/682491.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: histdict
historical dictionary
grammatical dictionary
electronic diachronic corpus
Opis:: The electronic system histdict is designed as a tool for research, adequate presentation and popularization of a part of Bulgaria’s cultural and historical heritage: the Bulgarian language and its medieval literature. The article describes the various steps in the development of histdict. Attention is paid to each component of the resource: specialized Unicode fonts, electronic diachronic corpus, dictionary of Old Bulgarian, historical dictionary equipped with tools for writing and editing dictionary entries, grammatical dictionary, prototypical search engine, and virtual keyboard. The article also lays out the principles followed in the development of the diachronic grammatical dictionary of the Bulgarian language.
Źródło:: Studia Ceranea; 2018, 8; 111-119
2084-140X
2449-8378
Pojawia się w:: Studia Ceranea
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: De los datos léxicos y de los textos que los contienen. A propósito del futuro próximo de la filología
On lexical data and the texts that contain them. About the near future of philology
Autorzy:: Pascual Rodríguez, José A.
Powiązania:: https://bibliotekanauki.pl/articles/31341192.pdf
Data publikacji:: 2021-12-31
Wydawca:: Wydawnictwo Uniwersytetu Śląskiego
Tematy:: Philology
etymology
history of words
historical grammar
corpus
Opis:: From the current situation of data in the philological work, a scenario is described of how things could look in the near future. In that scenario, data could go from being indications to becoming arguments in the study of the history of words. In order for that to take place, a good codification of texts (also linguistically speaking) is needed, so as to create models to be applied to the different possibilities of interpreting words.
Źródło:: Neophilologica; 2021, 33; 1-21
0208-5550
2353-088X
Pojawia się w:: Neophilologica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: The Electronic Historical Latvian Dictionary Based on the Corpus of Early Written Latvian Texts
Autorzy:: Andronova, Everita
Siliņa-Piņķe, Renāte
Trumpa, Anta
Vanags, Pēteris
Powiązania:: https://bibliotekanauki.pl/articles/676514.pdf
Data publikacji:: 2016
Wydawca:: Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:: Corpus-based historical dictionary of Latvian
the Corpus or Early Latvian Texts
dictionary entry
collocations and idioms
explanation of origin
cross-references in dictionaries
dictionary writing software
TLex Suit 2013
Opis:: The Electronic Historical Latvian Dictionary Based on the Corpus of Early Written Latvian Texts This article deals with the development of the Electronic Historical Latvian Dictionary (http://www.tezaurs.lv/lvvv) based on the Corpus of Early Written Latvian Texts (http://www.korpuss.lv/senie/). Some issues concerning the compilation and processing of the corpus data are discussed and the main sources added to the Corpus during the four-year project are described: the 16th c. Lord’s Prayers, 17th c. dictionaries, texts of oaths and laws, religious texts and so-called dedication poetry. The aim of the project is to compile a pilot electronic dictionary of 16th–17th century Latvian where all parts of speech are represented among the entries. This dictionary will contain ca. 1,200 entries, including both proper names and common nouns.The main emphasis is on the description of the dictionary entries supplied with relevant practical and theoretical observations. Each part of the dictionary entry is discussed, followed by comments on various issues pertaining to that part (e.g., the choice of headword and the representation of spelling versions) and how these were resolved. Special attention is paid to the head of entry, explanation of meaning deduced from the examples found in the corpus, different types of collocations and their representation in the dictionary, as well as etymological information. Finally, we present a brief review of the dictionary writing software TLex 2013 based on our experience with this tool. Elektroniczny historyczny słownik łotewski oparty na korpusie wczesnych tekstów łotewskichArtykuł poświęcony jest powstawaniu Elektronicznego historycznego słownika łotewskiego (http://www.tezaurs.lv/lvvv) w oparciu o korpus wczesnych tekstów łotewskich (http://www.korpuss.lv/senie/). Omówiono niektóre kwestie odnoszące się do opracowania i przetwarzania danych korpusowych. Ponadto opisano główne źródła dodane do korpusu w okresie czteroletniej pracy nad nim: Modlitewnik z XVI w., słowniki, teksty ślubowań i statutów, teksty religijne i tzw. poezja dedykowana z XVII w. Celem projektu jest opracowanie pilotażowego elektronicznego słownika szesnasto- i siedemnastowiecznego języka łotewskiego, w którym hasła obejmą wszystkie części mowy. Słownik będzie zawierał około 1200 haseł, w tym nazwy własne i rzeczowniki pospolite.Główny nacisk położono na opis haseł słownikowych, zawierający istotne uwagi praktyczne i teoretyczne. Omówiono poszczególne części hasła słownikowego, po czym umieszczono komentarz odnoszący się do różnych kwestii związanych z daną częścią (np. wybór hasła wyrazowego i przedstawienia wersji pisowni) i do przyjętych rozwiązań. Szczególną uwagę poświęcono główce hasła, objaśnieniu znaczenia wynikającego z przykładów występujacych w korpusie, różnym rodzajom kolokacji i ich przedstawieniu w słowniku, jak też informacjom etymologicznym. Na końcu zamieszczono zwięzły przegląd oprogramowania słownikowego TLex 2013, oparty na doświadczeniu autorów, zdobytym podczas pracy z tym narzędziem.
Źródło:: Acta Baltico-Slavica; 2016, 40
2392-2389
0065-1044
Pojawia się w:: Acta Baltico-Slavica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Soy, doy, estoy y voy: la yod desinencial en el corpus CODEA+ 2015
Soy, doy, estoy and voy: Desinential Yod in Corpus CODEA+ 2015
Autorzy:: Serrano Marín, Marina
Powiązania:: https://bibliotekanauki.pl/articles/2034893.pdf
Data publikacji:: 2021-12-20
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: historical verbal morphology
Historical Sociolinguistics
castellano medieval
Linguistic corpora
CODEA 2015
morfologia verbal histórica
sociolingüística histórica
corpus lingüísticos
Opis:: Desde una perspectiva que aúna diferentes disciplinas lingüísticas, se investiga la variación consustancial al cambio y la selección que afectaron a las variables de primera persona de singular de los verbos ser, estar, dar e ir en documentos notariales de los siglos XIII al XVI en el corpus CODEA+ 2015. El objeto de este análisis es determinar si las variables morfológicas verbales con yod desinencial aparecieron simultáneamente en todo el territorio de habla castellana y bilingüe de la Península o si, por el contrario, presentaron una frecuencia de aparición y una distribución cronológica, geográfica y diastrática diferentes. Se proporcionan datos empíricos cuantificables y argumentos teóricos que permiten, por una parte, establecer una periodización más ajustada del fenómeno analizado y por otra, ofrece una explicación en la que el componente geográfico y el diafásico desempeñan un papel fundamental en la reconstrucción de la variación histórica de la lengua.
From a perspective which combines different linguistic disciplines, this article researches the variation which is naturally determined by the linguistic change and the selection which have affected the 1st singular person of the present indicative variables of the verbs ser, estar, dar and ir. The key question which has given rise to this research lies in whether the verbal morphological variables with desinential yod, appeared simultaneously all over the Spanish-speaking and bilingual territory of the Peninsula, or if, on the contrary, they showed a different frequency of appearance and a different chronological, geographic and diastratic distribution. This analysis provides quantifiable empirical data and theoretical arguments which, on the one hand, let us establish a more accurate periodization of the studied phenomenon than the one provided up to now, and, on the other hand, they let us give an explanation in which the geographical and diaphasic components play an essential role in the reconstruction of the historical variation of the language.
Źródło:: e-Scripta Romanica; 2021, 9; 87-105
2392-0718
Pojawia się w:: e-Scripta Romanica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Buried Treasure in the Tyndale Corpus: Innovations and Archaisms
Autorzy:: Bell Canon, Elizabeth
Powiązania:: https://bibliotekanauki.pl/articles/888738.pdf
Data publikacji:: 2016
Wydawca:: Uniwersytet Warszawski. Wydawnictwa Uniwersytetu Warszawskiego
Tematy:: corpus linguistics
Early Modern English
historical linguistics
language change
William Tyndale
word formation
Opis:: The translations and polemical texts that make up the Tyndale Corpus are filled with linguistic buried treasure: lexical innovations, syntactic archaisms, metalinguistic com- mentary, and features related to language and dialect prejudice. The use of computer corpus analysis can reveal and illuminate what makes Tyndale different from other writers of his time, and why he is so important to the history of English and the modern religious register. Examining the patterns hidden in his work does not prevent us from appreciat- ing the beauty of his writing as some literary scholars might suggest. Instead, it al- lows us to better understand the approach he took to his work. This paper summa- rizes and exemplifies Tyndale’s contributions to English historical linguistics. The methodology involves reviewing previous scholarly assessments of Tyndale’s work, examining in detail his particular lexical and syntactic choices using text and cor- pus computer software, and, most especially, allowing William Tyndale to speak for himself.
Źródło:: Anglica. An International Journal of English Studies; 2016, 25/2; 151-165
0860-5734
Pojawia się w:: Anglica. An International Journal of English Studies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Unfinished “verbization” process: the development of predicative constructions with an adjective of the feminine gender in the 17th and 18th centuries in the light of corpus data
Niedokończona „werbizacja” – rozwój predykatywnych konstrukcji z przymiotnikiem w rodzaju żeńskim w XVII i XVIII w. w świetle danych korpusowych
Autorzy:: Bronikowska, Renata
Powiązania:: https://bibliotekanauki.pl/articles/2158341.pdf
Data publikacji:: 2021-12-29
Wydawca:: Polska Akademia Nauk. Instytut Języka Polskiego PAN
Tematy:: historical syntax
corpus research
adjectives
defective verbs
składnia historyczna
badania korpusowe
przymiotniki
czasowniki niewłaściwe
Opis:: The article is devoted to the changes in the Middle Polish syntactic construction in which the predicative function was performed by the nominative, singular, feminine form of the adjective. The research carried out on the corpus data was aimed at tracing the process that led to the transformation of those adjectival forms into defective verbs (verbization). The analysis covers six predicative adjectival forms most popular in the 17th and 18th centuries: MOŻNA ‛it is possible’, NIEMOŻNA ‛it is impossible’, NIEPODOBNA ‛it is impossible’, WIELKA ‛it is great’, PEWNA ‛it is certain’ and SŁUSZNA ‛it is right’. The first three of them changed their grammatical status, whereas for the rest the verbization process stopped. The 2nd half of the 18th century and the 1st half of the 19th century were decisive in this respect.
Artykuł jest poświęcony zmianom średniopolskiej konstrukcji składniowej, w której forma M lp. r.ż. przymiotnika występowała w funkcji predykatywnej. Badania przeprowadzone na danych korpusowych miały na celu prześledzenie procesu, który prowadził do przekształcenia form przymiotników w czasowniki niewłaściwe (werbizacja). Analizą zostało objętych sześć najbardziej popularnych w XVII i XVIII wieku predykatywnych form przymiotnikowych: MOŻNA, NIEMOŻNA, NIEPODOBNA, WIELKA, PEWNA i SŁUSZNA. Podczas gdy pierwsze trzy z nich zmieniły swój gramatyczny status, w wypadku pozostałych proces werbizacji został zahamowany. Rozstrzygający pod tym względem był okres drugiej połowy XVIII i pierwszej połowy XIX wieku.
Źródło:: Polonica; 2021, 41; 97-110
0137-9712
2545-045X
Pojawia się w:: Polonica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: К проблеме лексикографической параметризации терминологии исторической лексикологии восточнославянских языков
On the Problem of Lexicographic Parametrization Terminologies of Historical Lexicology of Eastern Slavic Languages
Autorzy:: Пятаева, Наталия
Powiązania:: https://bibliotekanauki.pl/articles/1046577.pdf
Data publikacji:: 2019-06-30
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: историческая лексикология восточнославянских языков
терминологический словарь
макро- и микроструктура словаря
экспериментальная словарная статья
корпус терминов исторической лексикологии
historical lexicology of East Slavic languages
corpus of terms of historical lexicology
terminological dictionary
dictionary macro and microstructure
experimental dictionary entry
Opis:: За триста лет изучения и лексикографической параметризации лексической системы восточнославянских языков сформировался корпус отраслевой терминологии исторической лексикологии и смежных с ней дисциплин (синхронной лексикологии, семасиологии, этимологии, диалектологии, исторической фонетики и грамматики близкородственных языков), который до настоящего времени не нашёл отражения в терминологическом словаре. Статья знакомит со сформированным корпусом терминов (1307 единиц) исторической лексикологии и намечает пути отражения его в отраслевом терминологическом словаре алфавитно-гнездового типа, составление которого основано на использовании системы лингвистических методов и методик: методики сплошной выборки терминов и терминологических обозначений из лингвистических словарей, справочников, энциклопедий и научной литературы; методики формирования корпуса отраслевой терминологии исторической лексикологии восточнославянских языков; метода семантического анализа содержания термина и тематического анализа терминологического корпуса; методики разработки терминологических дефиниций. Словарь будет состоять из двух частей: первая – словарь алфавитно-гнездового типа, вторая – тематический список терминов, объединённых перекрёстными ссылками. Корпус терминов распределяется на 10 тематических групп: теория исторической лексикологии, славянские языки, история восточнославянских языков, этимология, лексикология, семасиология, словообразование, морфология, диалектология, лексикография. Словарная статья первой части будет включать зоны: 1) заглавное слово или терминологическое сочетание с постановкой ударения и вариантными (синонимичными) терминами, если таковые имеются; 2) необходимые грамматические и//или функциональные пометы; 3) краткая этимология термина; 4) дериваты, имеющие отношение к терминологии исторической лексикологии, с описанием их семантики; 5) дефиниция (со ссылкой на ранее изданный словарь, если она разработана не автором данного словаря); 6) языковые иллюстрации, демонстрирующие явление, поименованное данным термином; 7) ссылка на вторую часть словаря, т.е. указание на тематическую группу, к которой относится термин. В качестве образцов в статье приведены две экспериментальные словарные статьи «Ахрония//панхрония» и «Корень (слова)».
Over three hundred years of study and lexicographic parameterization of the lexical-semantic system of East Slavic languages has led to the emergence of a corpus of industry terminology of historical lexicology and related disciplines (synchronous lexicology, semasiology, etymology, dialectology, historical phonetics and grammar of closely related languages), which, until today, has not been reflected in any terminology dictionary. The article discusses a compiled corpus (1307 units) of historical lexicology terms and proposes ways to incorporate it in the industry terminological dictionary of an alphanumeric type, the compilation of which is based on the use of a system of linguistic methods and techniques, such as a continuous selection of terms and terminological notation from linguistic dictionaries, reference books, encyclopedias and scientific literature; methods of compiling the corpus of industry terminology of historical lexicology of East Slavic languages; the method of semantic analysis of the content of terms and a thematic analysis of the terminological corpus; methods of the development of terminological definitions, etc. The dictionary will consist of two parts: the first is an alphabetical-nested type dictionary, the other is a thematic list of cross-referenced terms. The corpus of terms is divided into 10 thematic groups: the theory of historical lexicology, Slavic languages, the history of East Slavic languages, etymology, lexicology, semasiology, word formation, morphology, dialectology, lexicography. The vocabulary piece in the first part will include the following areas: 1) the headword or terminological combination with accent and variable (synonymous) terms, if any; 2) the necessary grammatical and//or functional labels; 3) a brief etymology of the term; 4) derivatives related to the terminology of historical lexicology with a description of semantics; 5) definition (with reference to a previously published dictionary, if it had not been developed by the author of this dictionary); 6) language illustrations demonstrating the phenomenon named by the relevant term; 7) references to the other body of the vocabulary, i.e. an indication of the thematic group to which a term belongs. As examples, the article covers are two experimental dictionary entries “Achronology //panchronology” and “Root (words).”
Źródło:: Acta Universitatis Lodziensis. Folia Linguistica Rossica; 2019, 17; 65-76
1731-8025
2353-9623
Pojawia się w:: Acta Universitatis Lodziensis. Folia Linguistica Rossica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Dynamics of Language Change: The Case of Polish barzo > bardzo
Autorzy:: Górski, Rafał L.
Powiązania:: https://bibliotekanauki.pl/articles/52904101.pdf
Data publikacji:: 2021
Wydawca:: Uniwersytet Jagielloński. Wydawnictwo Uniwersytetu Jagiellońskiego
Tematy:: historical linguistics
language change
Middle Polish
corpus linguistics
Piotrowski’s law
logistic regression
językoznawstwo historyczne
zmiana językowa
okres średniopolski
językoznawstwo korpusowe
prawo Piotrowskiego
regresja logistyczna
Opis:: The paper discusses the benefits and shortcomings of modelling a language change with logistic regression, an approach often called the Piotrowski-Altmann law. It is shown with an example of an isolated change, which occurred in Middle Polish, namely barzo > bardzo. The study is based on a historical corpus of Polish consisting of several hundreds of texts with over 12 million running words. Logistic regression based on the entire dataset shows relatively high goodness of fit, still there are some data points, especially close to the end of the process, which are quite far removed from the idealised trajectory. In the article, the author seeks to answer the question: to what extent the quality of the corpus affects the model. An experiment was conducted: a number of texts were randomly removed in order to create a smaller corpus, containing 90%, 75% and 50% of the texts of the entire set. Since such procedure is repeated 200 times, it is possible to compare the distribution of the scores indicating the goodness of fit of the model. It turns out that the smaller the corpus, the more diverse the goodness of fit, and in some rare cases it is even better than its counterpart for a larger corpus. Still the larger the corpus, the scores indicating goodness of fit tend to be higher.
W artykule omówiono korzyści płynące z modelowania zmiany językowej za pomocą regresji logistycznej, a także ograniczenia tej metody. Fakt, że zmiana taka powinna dać się opisać we wspomniany sposób, jest nazywany prawem Piotrowskiego-Altmanna. Ilustrujemy to przykładem izolowanej zmiany, jaka wystąpiła w języku średniopolskim, a mianowicie przejściem barzo > bardzo. Dane pozyskano z historycznego korpusu języka polskiego składającego sięz kilkuset tekstów i liczącego około 12 milionów słów. Regresja logistyczna oparta na całym zbiorze danych wykazuje dobre dopasowanie, wciąż jednak istnieją pewne punkty, szczególnie pod koniec procesu, które są doś ćdaleko od wyidealizowanej trajektorii. W artykule autor stara się odpowiedzieć na pytanie, w jakim stopniu jakość korpusu wpływa na model. W tym celu przeprowadzano eksperyment: z istniejącego korpusu usuwana jest losowo pewna liczba tekstów, tak aby stworzyć mniejsze korpusy zawierające 90%, 75% i 50% tekstów korpusu wyjściowego. Ponieważ taką procedurę powtarza się 200 razy, możliwe jest porównanie rozkładu wyników wskazujących na dopasowanie modelu. Wyniki wskazują, że im mniejszy korpus, tym większy rozrzut miary dobroci dopasowania, w skrajnych wypadkach nawet lepszy niż dla pełnego korpusu. Większe korpusy dają jednak na ogół lepsze wyniki dopasowania.
Źródło:: Studies in Polish Linguistics; 2021, 16, 3; 145-162
1732-8160
2300-5920
Pojawia się w:: Studies in Polish Linguistics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "historical corpus" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język