Temat: document classification - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Software Deterioration Control Based on Issue Reports
Autorzy:: Bushehrian, Omid
Sayari, Mohsen
Shamsinejad, Pirooz
Powiązania:: https://bibliotekanauki.pl/articles/2060907.pdf
Data publikacji:: 2021
Wydawca:: Politechnika Wrocławska. Oficyna Wydawnicza Politechniki Wrocławskiej
Tematy:: code smell
issue report
maintainability
document classification
Opis:: Introduction: Successive code changes during the maintenance phase may cause the emergence of bad smells and anti-patterns in code and gradually results in deterioration of the code and difficulties in its maintainability. Continuous Quality Control (QC) is essential in this phase to refactor the anti-patterns and bad smells. Objectives: The objective of this research has been to present a novel component called Code Deterioration Watch (CDW) to be integrated with existing Issue Tracking Systems (ITS) in order to assist the QC team in locating the software modules most vulnerable to deterioration swiftly. The important point regarding the CDW is the fact that its function has to be independent of the code level metrics rather it is totally based on issue level metrics measured from ITS repositories. Methods: An issue level metric that properly alerts us of bad-smell emergence was identified by mining software repositories. To measure that metric, a Stream Clustering algorithm called ReportChainer was proposed to spot Relatively Long Chains (RLC) of incoming issue reports as they tell the QC team that a concentrated point of successive changes has emerged in the software. Results: The contribution of this paper is partly creating a huge integrated code and issue repository of twelve medium and large size open-source software products from Apache and Eclipse. By mining this repository it was observed that there is a strong direct correlation (0.73 on average) between the number of issues of type "New Feature" reported on a software package and the number of bad-smells of types "design" and "error prone" emerged in that package. Besides a strong direct correlation (0.97 on average) was observed between the length of a chain and the magnitude of times it caused changes to a software package. Conclusion: The existence of direct correlation between the number of issues of type "New Feature" reported on a software package and (1) the number of bad-smells of types "design" and "error prone" and (2) the value of "CyclomaticComplexity" metric of the package, justifies the idea of Quality Control merely based on issue-level metrics. A stream clustering algorithm can be effectively applied to alert the emergence of a deteriorated module.
Źródło:: e-Informatica Software Engineering Journal; 2021, 15, 1; 115--132
1897-7979
Pojawia się w:: e-Informatica Software Engineering Journal
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Formalization of Technological Knowledge in the Field of Metallurgy Using Document Classification Tools Supported with Semantic Techniques
Autorzy:: Regulski, K.
Powiązania:: https://bibliotekanauki.pl/articles/353849.pdf
Data publikacji:: 2017
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: application of information technology to the foundry industry
document classification
semantic techniques
knowledge formalization
text mining
Opis:: The process of knowledge formalization is an essential part of decision support systems development. Creating a technological knowledge base in the field of metallurgy encountered problems in acquisition and codifying reusable computer artifacts based on text documents. The aim of the work was to adapt the algorithms for classification of documents and to develop a method of semantic integration of a created repository. Author used artificial intelligence tools: latent semantic indexing, rough sets, association rules learning and ontologies as a tool for integration. The developed methodology allowed for the creation of semantic knowledge base on the basis of documents in natural language in the field of metallurgy.
Źródło:: Archives of Metallurgy and Materials; 2017, 62, 2A; 715-720
1733-3490
Pojawia się w:: Archives of Metallurgy and Materials
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: A case study in text mining of discussion forum posts: Classification with bag of words and global vectors
Autorzy:: Cichosz, P.
Powiązania:: https://bibliotekanauki.pl/articles/330299.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: text mining
discussion forum
text representation
document classification
word embedding
eksploracja tekstu
forum dyskusyjne
reprezentacja tekstu
klasyfikacja dokumentów
Opis:: Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an interesting and useful research direction. This work presents a case study in which selected classification algorithms are applied to posts from a Polish discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana. The utility of two different vector text representations is examined: the simple bag of words representation and the more refined embedded global vectors one. While the former is found to work well for the multinomial naive Bayes algorithm, the latter turns out more useful for other classification algorithms: logistic regression, SVMs, and random forests. The obtained results suggest that post-classification can be applied for measuring publication intensity of particular topics and, in the case of forums related to psychoactive substances, for monitoring the risk of drug-related crime.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 787-801
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Exploring the use of syntactic dependency features for document-level sentiment classification
Autorzy:: Kalaivani, K. S.
Kuppuswami, S.
Powiązania:: https://bibliotekanauki.pl/articles/201609.pdf
Data publikacji:: 2019
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: document-level sentiment classification
syntactic dependency features
generalized dependency features
information gain
weighted frequency
weighted odds
zdobywanie informacji
częstotliwość
szanse
Opis:: An automatic analysis of product reviews requires deep understanding of the natural language text by machine. The limitation of bag-of-words (BoW) model is that a large amount of word relation information from the original sentence is lost and the word order is ignored. Higher-order-N-grams also fail to capture the long-range dependency relations and word order information. To address these issues, syntactic features extracted from the dependency relations can be used for machine learning based document-level sentiment classification. Generalization of syntactic dependency features and negation handling is used to achieve more accurate classification. Further to reduce the huge dimensionality of the feature space, feature selection methods based on information gain (IG) and weighted frequency and odds (WFO) are used. A supervised feature weighting scheme called delta term frequency-inverse document frequency (TF-IDF) is also employed to boost the importance of discriminative features using the observed uneven distribution of features between the two classes. Experimental results show the effectiveness of generalized syntactic dependency features over standard features for sentiment classification using Boolean multinomial naive Bayes (BMNB) classifier.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2019, 67, 2; 339-347
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: New algorithm for determining the number of features for the effective sentiment-classification of text documents
Nowy algorytm ustalania liczby zmiennych potrzebnych do klasyfikacji dokumentów tekstowych ze względu na ich wydźwięk emocjonalny
Autorzy:: Idczak, Adam
Korzeniewski, Jerzy
Powiązania:: https://bibliotekanauki.pl/articles/18105028.pdf
Data publikacji:: 2023-05-31
Wydawca:: Główny Urząd Statystyczny
Tematy:: sentiment analysis
document sentiment classification
text mining
logistic regression
naive Bayes classifier
feature selection
correlation
analiza sentymentu
klasyfikacja dokumentów ze względu na wydźwięk emocjonalny
eksploracja tekstu
regresja logistyczna
naiwny klasyfikator Bayesa
dobór cech
korelacja
Opis:: Sentiment analysis of text documents is a very important part of contemporary text mining. The purpose of this article is to present a new technique of text sentiment analysis which can be used with any type of a document-sentiment-classification method. The proposed technique involves feature selection independently of a classifier, which reduces the size of the feature space. Its advantages include intuitiveness and computational noncomplexity. The most important element of the proposed technique is a novel algorithm for the determination of the number of features to be selected sufficient for the effective classification. The algorithm is based on the analysis of the correlation between single features and document labels. A statistical approach, featuring a naive Bayes classifier and logistic regression, was employed to verify the usefulness of the proposed technique. They were applied to three document sets composed of 1,169 opinions of bank clients, obtained in 2020 from a Poland-based bank. The documents were written in Polish. The research demonstrated that reducing the number of terms over 10-fold by means of the proposed algorithm in most cases improves the effectiveness of classification.
Analiza sentymentu, czyli wydźwięku emocjonalnego, dokumentów tekstowych stanowi bardzo ważną część współczesnej eksploracji tekstu (ang. text mining). Celem artykułu jest przedstawienie nowej techniki analizy sentymentu tekstu, która może znaleźć zastosowanie w dowolnej metodzie klasyfikacji dokumentów ze względu na ich wydźwięk emocjonalny. Proponowana technika polega na niezależnym od klasyfikatora doborze cech, co skutkuje zmniejszeniem rozmiaru ich przestrzeni. Zaletami tej propozycji są intuicyjność i prostota obliczeniowa. Zasadniczym elementem omawianej techniki jest nowatorski algorytm ustalania liczby terminów wystarczających do efektywnej klasyfikacji, który opiera się na analizie korelacji pomiędzy pojedynczymi cechami dokumentów a ich wydźwiękiem. W celu weryfikacji przydatności proponowanej techniki zastosowano podejście statystyczne. Wykorzystano dwie metody: naiwny klasyfikator Bayesa i regresję logistyczną. Za ich pomocą zbadano trzy zbiory dokumentów składające się z 1169 opinii klientów jednego z banków działających na terenie Polski uzyskanych w 2020 r. Dokumenty zostały napisane w języku polskim. Badanie pokazało, że kilkunastokrotne zmniejszenie liczby terminów przy zastosowaniu proponowanej techniki na ogół poprawia jakość klasyfikacji.
Źródło:: Wiadomości Statystyczne. The Polish Statistician; 2023, 68, 5; 40-57
0043-518X
Pojawia się w:: Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: EWD-P as an example of a "The Best of Good Practice" project
Autorzy:: Bliźniuk, G.
Biernacka, D.
Kowalska, G.
Momotko, M.
Powiązania:: https://bibliotekanauki.pl/articles/309439.pdf
Data publikacji:: 2006
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: European Document Exchange System - Poland
EWD-P
workflow engine
classification and categorization of documents
central repository
advanced search engine
Rodan Systems
best practice project
Opis:: The European Document Exchange System - Poland (EWD-P), developed by Rodan Systems SA is one of the most modern electronic document exchange systems in European Union. EWD-P is a workflow system that facilitates decision-making within the state bureaucratic system, with a special brief to work out official Polish government standpoints on numerous legislative issues constantly arising within the EU. The EWD-P project has created an effective platform for electronic exchange of documents related to the EU legislative process. The intelligent workflow management functionality aims to support a complex flow of documents through the meanders of the central government administration. The EWD-P system includes a high-level classification of documents using an artificial intelligence technique for document categorization. The EWD-P system simplifies interaction between ministerial departments involved in elaboration of a final common position and facilitates a more efficient organization of work within government and other public administration institutions involved in the EU legislative process.
Źródło:: Journal of Telecommunications and Information Technology; 2006, 2; 15-23
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Критерії відбору та первинна обробка емпіричного матеріалу паралельного корпусу юридичних текстів
Selection Criteria and Initial Processing of Empirical Material for a Parallel Corpus of Legal Texts
Autorzy:: MATVIEIEVA, Svitlana
Powiązania:: https://bibliotekanauki.pl/articles/1398083.pdf
Data publikacji:: 2019
Wydawca:: Ateneum - Akademia Nauk Stosowanych w Gdańsku
Tematy:: genre classification
parallel corpus
metadata
document card
legal text
жанрова класифікація
паралельний корпус текстів
метадані
картка документу
юридичний текст
typologia gatunkowa
korpus równoległy
metadane
karta dokumentu
tekst prawny
Opis:: The article deals with the formation of criteria for the primary selection of legal texts for the English-Ukrainian parallel corpus of legal texts. The author has developed a classification of legal texts on the basis of the style and text genres, taking into account the types of legal acts, and makes an attempt to combine legal and linguistic characteristics applicable to the classification of legal documents. The article proposes the structure of the metadata card for corpus texts (original and translation), which are tested on text samples. The need for metatext data and extra-linguistic information for working with corpus texts is substantiated in the article.
Статтю присвячено питанню формування критеріїв первинного відбору правничих текстів для англо-українського паралельного корпусу юридичних текстів. Автором розроблено класифікацію юридичних текстів за ознаками сфери обслуговування й жанрів текстів із урахуванням типів нормативних актів, що становить спробу поєднання юридичних та лінгвістичних характеристик, які застосовуються до класифікації юридичних документів. У статті запропоновано структуру картки метаданих корпусних текстів (оригіналу та перекладу), котру апробовано на текстових зразках; обґрунтовано необхідність метатекстових даних та екстралінгвістичної інформації для роботи з корпусними текстами.
Artykuł poświęcony jest sformułowaniu kryteriów pierwotnego wyboru tekstów prawnych dla angielsko-ukraińskiego korpusu równoległego tekstów prawnych. Autor opracował klasyfikację tekstów prawnych na podstawie stylu i gatunku tekstu, z uwzględnieniem rodzajów aktów normatywnych oraz podejmuje próbę połączenia charakterystyk prawnych i językowych, mających zastosowanie przy klasyfikacji dokumentów prawnych. W artykule zaproponowano strukturę karty metadanych dla tekstów korpusowych (oryginału i tłumaczenia), która została przetestowana na próbkach tekstowych. W artykule uzasadniono także potrzebę uwzględnienia danych metatekstowych i informacji pozajęzykowych przy pracy z tekstami korpusowymi.
Źródło:: Forum Filologiczne Ateneum; 2019, 7, 1; 167-181
2353-2912
2719-8537
Pojawia się w:: Forum Filologiczne Ateneum
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "document classification" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język