Temat: text segmentation - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Exploiting bert for malformed segmentation detection to improve scientific writings
Autorzy:: Halawa, Abdelrahman
Gamalel-Din, Shehab
Nasr, Abdurrahman
Powiązania:: https://bibliotekanauki.pl/articles/30148253.pdf
Data publikacji:: 2023
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: NLP
text segmentation
mal-segmentation
BERT
Opis:: Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique.
Źródło:: Applied Computer Science; 2023, 19, 2; 126-141
1895-3735
2353-6977
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Design and analysis of a lean interface for Sanskrit corpus annotation
Autorzy:: Goyal, P.
Huet, G.
Powiązania:: https://bibliotekanauki.pl/articles/103855.pdf
Data publikacji:: 2016
Wydawca:: Polska Akademia Nauk. Instytut Podstaw Informatyki PAN
Tematy:: Sanskrit
text segmentation
annotation
interface
Opis:: We describe an innovative computer interface designed to assist annotators in the efficient selection of segmentation solutions for proper tagging of Sanskrit corpora. The proposed solution uses a compact representation of the shared forest of all segmentations. The main idea is to represent the union of all segmentations, abstracting from the sandhi rules used, and aligning with the input sentence. We show that this representation provides an exponential saving, in both space and time. The segmentation methodology is lexicon-directed. When the lexicon does not have full coverage of the corpus vocabulary, some chunks of the input may fail to be recognized. We designed a lexiconacquisition facility, which remedies this incompleteness and makes the interface more robust. This interface has been implemented, and is currently being applied to the annotation of the Sanskrit Library corpus. Evaluation over 1,500 sentences from the Pañcatantra text shows the effectiveness of the proposed interface on real corpus data.
Źródło:: Journal of Language Modelling; 2016, 4, 2; 145-182
2299-856X
2299-8470
Pojawia się w:: Journal of Language Modelling
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Tytuły i śródtytuły w przekładach biblii na język polski
Titles and Subtitles in Translations of the Bible into Polish
Autorzy:: Komorowska, Ewa
Powiązania:: https://bibliotekanauki.pl/articles/1954220.pdf
Data publikacji:: 2003
Wydawca:: Katolicki Uniwersytet Lubelski Jana Pawła II. Towarzystwo Naukowe KUL
Tematy:: tytuł
śródtytuł
podtytuł
Biblia
segmentacja tekstu
title
mid-title
sub-title
Bible
text segmentation
Opis:: This article refers to problems which have not been addressed so far and which concern the origin, frame and structure of titles and sub-titles in Polish editions of the Bible. It also explains and orders terminological issues. Titles had had appeared for the first time in nineteenth-century Polish translations of the Bible whereas their regular occurrence started in 1945. There are two purposes of the Bible titles: practical and interpretative ones thanks to which readers can easily find easy and interpret a given fragment of the Bible. Professing traditions makes titles to be repetitive forms built on the same basis to a great extent. That is why Bible titles are characterized by the lack of individuality, rare use of metaphors and the use of firm word-connections (often in the form of idiomatic combinations). Apart from that, titles are used: 1.strictly with reference to Bible tales,2. to summarize the content of stories,3. in quotation forms. Nominal forms – in the form of sentence equivalents – dominate in grammatical structures of the titles. Verbal forms - in the form of single or complex sentences – are used rarely. The Bible titles are present in common consciousness of each human being. The symbolics of the Bible nomenclature is very often exploited in art, a fact which proves the popularity of the Bible titles.
Źródło:: Roczniki Humanistyczne; 2003, 51, 6; 53-84
0035-7707
Pojawia się w:: Roczniki Humanistyczne
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Przeciwdziałanie kryzysowi formy? O (nie)etycznych konceptach dziennikarzy prasowych
Autorzy:: Bryła, Władysława
Bryła-Cruz, Agnieszka
Powiązania:: https://bibliotekanauki.pl/articles/1830293.pdf
Data publikacji:: 2020-01-29
Wydawca:: Akademia Techniczno-Humanistyczna w Bielsku-Białej
Tematy:: prasa
prawda
figuratywność
segmentacja tekstu
neologizm
przełączanie kodów
press
the truth
text segmentation
figurativeness
neologism
codeswitching
Opis:: Przedmiotem analizy jest język wybranych współczesnych tekstów prasowych. Opisano cztery sposoby oddziaływania na odbiorcę: figuratywność, segmentację tekstu, językowe mechanizmy kreowania i nazywania osób i zdarzeń obecnych w bieżącym życiu społeczno-politycznym oraz przełączanie kodów. Podkreślono, że w dobie dużej konkurencji na rynku medialnym działania dziennikarzy często bywają skupione bardziej na pozyskaniu czytelnika niż na przekazywaniu prawdy
Źródło:: Świat i Słowo; 2019, 33, 2; 15-31
1731-3317
Pojawia się w:: Świat i Słowo
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Historia o Magielonie, królewnie neapolitańskiej. Pytanie o podstawę polskiego przekładu
“The Story of Magielona, Princess of Naples”: The Question of the Basis of the Polish Translation
Autorzy:: Wierzbicka-Trwoga, Krystyna
Winiarska-Górska, Izabela
Powiązania:: https://bibliotekanauki.pl/articles/14769789.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet im. Adama Mickiewicza w Poznaniu
Tematy:: Magielona/Maguelonne/Magelona
Warbeck
Polish translation
Czech translation
text segmentation
early modern literature
przekład polski
przekład czeski
segmentacja tekstu
literatura wczesnonowożytna
Opis:: Artykuł dąży do ustalenia podstawy polskiego przekładu starofrancuskiego romansu o „pięknej Magielonie”. Polska wersja została przełożona w wieku XVI, lecz nie z francuskiego, tylko za pośrednictwem wersji niemieckiej lub czeskiej. Dotychczas brak było badań, które pozwoliłyby określić podstawę polskiego tłumaczenia, przekazanego w kilku wydaniach siedemnastowiecznych. Analiza przedstawiona w niniejszej rozprawie dotyczy delimitacji tekstu, czyli podziału treści na rozdziały w polskich drukach w porównaniu do niemieckich wydań z XVI wieku oraz do wersji czeskiej (poświadczonej dopiero z wieku XVIII), jak również porównania inicjalnych zdań rozdziałów. Pozwala sformułować wniosek, że polski przekład powstał najprawdopodobniej z wersji niemieckiej, z wydania zawierającego pierwotną segmentację tekstu niemieckiego, który w drugiej połowie XVI wieku wydawany był w redakcji wtórnej z drobniejszą segmentacją tekstu.
The paper seeks to establish the basis of the Polish translation of the Old French romance about the “beautiful Magielona.” The Polish version was translated in the sixteeth century, though not from the French, but via a German or Czech version. Until now, there were no studies on the base text of the Polish translation, transmitted in several seventeenth-century editions. The analysis presented in our paper concerns the delimitation of the text, i.e. the division of content into chapters in the Polish prints in comparison to the German editions of the sixteenth century and to the Czech version (transmitted from the eighteenth century), as well as the comparison of the opening sentences of the chapters. It allows for the conclusion that the Polish translation arose most probably from the German version, from an edition containing the primary segmentation of the German text, which in the second half of the sixteenth century was published in a secondary edition with a finer segmentation of the text.
Źródło:: Porównania; 2023, 33, 1; 403-422
1733-165X
Pojawia się w:: Porównania
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Projection-based text line segmentation with a variable threshold
Autorzy:: Ptak, R.
Żygadło, B.
Unold, O.
Powiązania:: https://bibliotekanauki.pl/articles/329884.pdf
Data publikacji:: 2017
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: document image processing
handwritten text
text line segmentation
projection profile
offline cursive script recognition
przetwarzanie obrazu dokumentu
tekst odręczny
segmentacja linii tekstu
profil projekcyjny
Opis:: Document image segmentation into text lines is one of the stages in unconstrained handwritten document recognition. This paper presents a new algorithm for text line separation in handwriting. The developed algorithm is based on a method using the projection profile. It employs thresholding, but the threshold value is variable. This permits determination of low or overlapping peaks of the graph. The proposed technique is shown to improve the recognition rate relative to traditional methods. The algorithm is robust in text line detection with respect to different text line lengths.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2017, 27, 1; 195-206
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Line segmentation of handwritten text using histograms and tensor voting
Autorzy:: Babczyński, Tomasz
Ptak, Roman
Powiązania:: https://bibliotekanauki.pl/articles/330796.pdf
Data publikacji:: 2020
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: document image processing
handwritten text
text line segmentation
projection profile
text string
offline cursive script recognition
ICDAR 2009 competition
przetwarzanie obrazu dokumentu
tekst odręczny
segmentacja linii tekstu
profil projekcyjny
ciąg tekstowy
Opis:: There are a large number of historical documents in libraries and other archives throughout the world. Most of them are written by hand. In many cases they exist in only one specimen and are hard to reach. Digitization of such artifacts can make them available to the community. But even digitized, they remain unsearchable, and an important task is to draw the contents in the computer readable form. One of the first steps in this direction is to recognize where the lines of the text are. Computational intelligence algorithms can be used to solve this problem. In the present paper, two groups of algorithms, namely, projection-based and tensor voting-based, are compared. The performance is evaluated on a data set and with the procedure proposed by the organizers of the ICDAR 2009 competition.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2020, 30, 3; 585-596
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "text segmentation" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język