Temat: data-mining - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Data mining tasks and methods – implementations in R
Autorzy:: Figielska, Ewa
Powiązania:: https://bibliotekanauki.pl/articles/1397482.pdf
Data publikacji:: 2020
Wydawca:: Warszawska Wyższa Szkoła Informatyki
Tematy:: data mining
R programming language
classification
prediction
clustering
association
Opis:: The aim of the paper is to present how some of the data mining tasks can be solved using the R programming language. The full R scripts are provided for preparing data sets, solving the tasks and analyzing the results.
Źródło:: Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki; 2020, 14, 23; 27-49
1896-396X
2082-8349
Pojawia się w:: Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Data mining
Autorzy:: Morzy, Tadeusz
Powiązania:: https://bibliotekanauki.pl/articles/703139.pdf
Data publikacji:: 2007
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: data mining
data analysis
evolution of information technology
association analysis
classification
clustering
Web mining
Opis:: Recent advances in data capture, data transmission and data storage technologies have resulted in a growing gap between more powerful database systems and users' ability to understand and effectively analyze the information collected. Many companies and organizations gather gigabytes or terabytes of business transactions, scientific data, web logs, satellite pictures, textreports, which are simply too large and too complex to support a decision making process. Traditional database and data warehouse querying models are not sufficient to extract trends, similarities and correlations hidden in very large databases. The value of the existing databases and data warehouses can be significantly enhanced with help of data mining. Data mining is a new research area which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from large databases and data warehouses. Data mining, also referred to as database mining or knowledge discovery in databases, can help answer business questions that were too time consuming to resolve with traditional data processing techniques. The process of mining the data can be perceived as a new way of querying – with questions such as ”which clients are likely to respond to our next promotional mailing, and why?”. The aim of this paper is to present an overall picture of the data mining field as well as presents briefly few data mining methods. Finally, we summarize the concepts presented in the paper and discuss some problems related with data mining technology.
Źródło:: Nauka; 2007, 3
1231-8515
Pojawia się w:: Nauka
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: A survey of big data classification strategies
Autorzy:: Banchhor, Chitrakant
Srinivasu, N.
Powiązania:: https://bibliotekanauki.pl/articles/2050171.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: big data
data mining
MapReduce
classification
machine learning
evolutionary intelligence
deep learning
Opis:: Big data plays nowadays a major role in finance, industry, medicine, and various other fields. In this survey, 50 research papers are reviewed regarding different big data classification techniques presented and/or used in the respective studies. The classification techniques are categorized into machine learning, evolutionary intelligence, fuzzy-based approaches, deep learning and so on. The research gaps and the challenges of the big data classification, faced by the existing techniques are also listed and described, which should help the researchers in enhancing the effectiveness of their future works. The research papers are analyzed for different techniques with respect to software tools, datasets used, publication year, classification techniques, and the performance metrics. It can be concluded from the here presented survey that the most frequently used big data classification methods are based on the machine learning techniques and the apparently most commonly used dataset for big data classification is the UCI repository dataset. The most frequently used performance metrics are accuracy and execution time.
Źródło:: Control and Cybernetics; 2020, 49, 4; 447-469
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Data mining approach in diagnosis and treatment of chronic kidney disease
Autorzy:: Turiac, Andreea S.
Zdrodowska, Małgorzata
Powiązania:: https://bibliotekanauki.pl/articles/2105985.pdf
Data publikacji:: 2022
Wydawca:: Politechnika Białostocka. Oficyna Wydawnicza Politechniki Białostockiej
Tematy:: feature selection
classification
classification rules
action rules
data mining
chronic kidney disease
Opis:: Chronic kidney disease is a general definition of kidney dysfunction that lasts more than 3 months. When chronic kidney disease is advanced, the kidneys are no longer able to cleanse the blood of toxins and harmful waste products and can no longer support the proper function of other organs. The disease can begin suddenly or develop latently over a long period of time without the presence of characteristic symptoms. The most common causes are other chronic diseases – diabetes and hypertension. Therefore, it is very important to diagnose the disease in early stages and opt for a suitable treatment - medication, diet and exercises to reduce its side effects. The purpose of this paper is to analyse and select those patient characteristics that may influence the prevalence of chronic kidney disease, as well as to extract classification rules and action rules that can be useful to medical professionals to efficiently and accurately diagnose patients with kidney chronic disease. The first step of the study was feature selection and evaluation of its effect on classification results. The study was repeated for four models – containing all available patient data, containing features identified by doctors as major factors in chronic kidney disease, and models containing features selected using Correlation Based Feature Selection and Chi-Square Test. Sequential Minimal Optimization and Multilayer Perceptron had the best performance for all four cases, with an average accuracy of 98.31% for SMO and 98.06% for Multilayer Perceptron, results that were confirmed by taking into consideration the F1-Score, for both algorithms was above 0.98. For all these models the classification rules are extracted. The final step was action rule extraction. The paper shows that appropriate data analysis allows for building models that can support doctors in diagnosing a disease and support their deci-sions on treatment. Action rules can be important guidelines for the doctors. They can reassure the doctor in his diagnosis or indicate new, previously unseen ways to cure the patient.
Źródło:: Acta Mechanica et Automatica; 2022, 16, 3; 180--188
1898-4088
2300-5319
Pojawia się w:: Acta Mechanica et Automatica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Heterogeneous distance functions for prototype rules : influence of parameters on probability estimation
Autorzy:: Blachnik, M.
Duch, W.
Wieczorek, T.
Powiązania:: https://bibliotekanauki.pl/articles/92882.pdf
Data publikacji:: 2006
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: prototype rules
probability estimation
heterogeneous distance functions
similarity-based methods
classification
data mining
Opis:: An interesting and little explored way to understand data is based on prototype rules (P-rules). The goal of this approach is to find optimal similarity (or distance) functions and position of prototypes to which unknown vectors are compared. In real applications similarity functions frequently involve different types of attributes, such as continuous, discrete, binary or nominal. Heterogeneous distance functions that may handle such diverse information are usually based on probability distance measure, such as the Value Difference Metrics (VDM). For continuous attributes calculation of probabilities requires estimations of probability density functions. This process requires careful selection of several parameters that may have important impact on the overall classification of accuracy. In this paper, various heterogeneous distance function based on VDM measure are presented, among them some new heterogeneous distance functions based on different types of probability estimation. Results of many numerical experiments with such distance functions are presented on artificial and real datasets, and quite simple P-rules for several heterogeneous databases extracted.
Źródło:: Studia Informatica : systems and information technology; 2006, 1(7); 19-30
1731-2264
Pojawia się w:: Studia Informatica : systems and information technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Application of selected supervised classification methods to bank marketing campaign
Autorzy:: Grzonka, D.
Borowik, B.
Suchacka, G.
Powiązania:: https://bibliotekanauki.pl/articles/94739.pdf
Data publikacji:: 2016
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie
Tematy:: classification
supervised learning
data mining
decision trees
bagging
boosting
random forests
bank marketing
R project
Opis:: Supervised classification covers a number of data mining methods based on training data. These methods have been successfully applied to solve multi-criteria complex classification problems in many domains, including economical issues. In this paper we discuss features of some supervised classification methods based on decision trees and apply them to the direct marketing campaigns data of a Portuguese banking institution. We discuss and compare the following classification methods: decision trees, bagging, boosting, and random forests. A classification problem in our approach is defined in a scenario where a bank’s clients make decisions about the activation of their deposits. The obtained results are used for evaluating the effectiveness of the classification rules.
Źródło:: Information Systems in Management; 2016, 5, 1; 36-48
2084-5537
2544-1728
Pojawia się w:: Information Systems in Management
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: A Data Mining Approach for Analysis of a Wire Electrical Discharge Machining Process
Autorzy:: Dandge, Shruti Sudhakar
Chakraborty, Shankar
Powiązania:: https://bibliotekanauki.pl/articles/2023974.pdf
Data publikacji:: 2021-09
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: wire electrical discharge machining
data mining
classification and regression tree
chi-squared
automatic interaction detection
classification
Opis:: Wire electrical discharge machining (WEDM) is a non-conventional material-removal process where a continuously travelling electrically conductive wire is used as an electrode to erode material from a workpiece. To explore its fullest machining potential, there is always a requirement to examine the effects of its varied input parameters on the responses and resolve the best parametric setting. This paper proposes parametric analysis of a WEDM process by applying non-parametric decision tree algorithm, based on a past experimental dataset. Two decision tree-based classification methods, i.e. classification and regression tree (CART) and Chi-squared automatic interaction detection (CHAID) are considered here as the data mining tools to examine the influences of six WEDM process parameters on four responses, and identify the most preferred parametric mix to help in achieving the desired response values. The developed decision trees recognize pulse-on time as the most indicative WEDM process parameter impacting almost all the responses. Furthermore, a comparative analysis on the classification performance of CART and CHAID algorithms demonstrates the superiority of CART with higher overall classification accuracy and lower prediction risk.
Źródło:: Management and Production Engineering Review; 2021, 13, 3; 116-128
2080-8208
2082-1344
Pojawia się w:: Management and Production Engineering Review
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Comparative evaluation of the different data mining techniques used for the medical database
Autorzy:: Kasperczuk, A.
Dardzińska, A.
Powiązania:: https://bibliotekanauki.pl/articles/386432.pdf
Data publikacji:: 2016
Wydawca:: Politechnika Białostocka. Oficyna Wydawnicza Politechniki Białostockiej
Tematy:: data mining
classification
WEKA
J48
MLP
apriori
association rules
baza wiedzy medycznej
eksploracja danych
algorytm klasyfikacji
Opis:: Data mining is the upcoming research area to solve various problems. Classification and finding association are two main steps in the field of data mining. In this paper, we use three classification algorithms: J48 (an open source Java implementation of C4.5 algorithm), Multilayer Perceptron - MLP (a modification of the standard linear perceptron) and Naïve Bayes (based on Bayes rule and a set of conditional independence assumptions) of the Weka interface. These classifiers have been used to choose the best algorithm based on the conditions of the voice disorders database. To find association rules over transactional medical database first we use apriori algorithm for frequent item set mining. These two initial steps of analysis will help to create the medical knowledgebase. The ultimate goal is to build a model, which can improve the way to read and interpret the existing data in medical database and future data as well.
Źródło:: Acta Mechanica et Automatica; 2016, 10, 3; 233-238
1898-4088
2300-5319
Pojawia się w:: Acta Mechanica et Automatica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Inefficiency of data mining algorithms and its architecture: with emphasis to the shortcoming of data mining algorithms on the output of the researches
Autorzy:: Tesema, Workineh
Powiązania:: https://bibliotekanauki.pl/articles/118221.pdf
Data publikacji:: 2019
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: data mining
classification
clustering
association
regression
algorithms bottleneck
pozyskiwanie danych
klasyfikacja
grupowanie
asocjacja
regresja
wąskie gardło algorytmów
Opis:: This review paper presents a shortcoming associated to data mining algorithm(s) classification, clustering, association and regression which are highly used as a tool in different research communities. Data mining researches has successfully handling large amounts of dataset to solve the problems. An increase in data sizes was brought a bottleneck on algorithms to retrieve hidden knowledge from a large volume of datasets. On the other hand, data mining algorithm(s) has been unable to analysis the same rate of growth. Data mining algorithm(s) must be efficient and visual architecture in order to effectively extract information from huge amounts of data in many data repositories or in dynamic data streams. Data visualization researchers believe in the importance of giving users an overview and insight into the data distributions. The combination of the graphical interface is permit to navigate through the complexity of statistical and data mining techniques to create powerful models. Therefore, there is an increasing need to understand the bottlenecks associated with the data mining algorithms in modern architectures and research community. This review paper basically to guide and help the researchers specifically to identify the shortcoming of data mining techniques with domain area in solving a certain problems they will explore. It also shows the research areas particularly a multimedia (where data can be sequential, audio signal, video signal, spatio-temporal, temporal, time series etc) in which data mining algorithms not yet used.
Źródło:: Applied Computer Science; 2019, 15, 3; 73-86
1895-3735
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Selection of data mining method for multidimensional evaluation of the manufacturing process state
Autorzy:: Rogalewicz, M.
Piłacińska, M.
Kujawińska, A.
Powiązania:: https://bibliotekanauki.pl/articles/407333.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: jakość kontroli
proces produkcji
eksploaracja danych
metoda
klasyfikacja
quality control
process state evaluation
data mining methods
classification
Opis:: The article deals with the issues involved in evaluating the process state on the basis of many measures, including: process parameters, diagnostic signals and events occurring during the process. These measures as well as those measurements traditionally used in the evaluation of process capability, offer a relevant source of information about the manufacturing process and the authors attempted to ascertain the most suitable method, or group of methods, for achieving this. They present the main criteria for the categorization division of the methods of the manufacturing process state evaluation and, from those identified, distinguish the traditional from Data Mining methods. The authors then specify some basic requirements regarding the desired method or group of methods and focus on the classification problem. A division and classification of the methods is made and briefly described. Finally, the authors specify the criteria for their selection of the Data Mining method type as being the most appropriate for the evaluation of the manufacturing process state and, from within this type, offer the most suitable groups of methods. Some directions for further research are discussed at the end of the article.
Źródło:: Management and Production Engineering Review; 2012, 3, 2; 27-35
2080-8208
2082-1344
Pojawia się w:: Management and Production Engineering Review
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Rough Set Application for the Tax Payer Classification Rules
Zastosowanie teorii zbiorów przybliżonych w zadaniu klasyfikacji podatników
Autorzy:: Misztal, L.
Powiązania:: https://bibliotekanauki.pl/articles/156046.pdf
Data publikacji:: 2009
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: zbiory przybliżone
eksploracja danych
klasyfikacja
ekstrakcja reguł
reguły decyzyjne
rough sets
data mining
classification
rules extraction
decision rules
Opis:: Classification of the tasks for real-world problems becomes possible because of creation and use of more efficient IT systems. It also targets rough set methods as well described with solid mathematical basis for classification tasks. In the presented paper the application of rough set theory with the usage of significance of attributes and decision rule sets for classification of taxpayers is described. There are taken into account the negative or positive results of taxation control, and specific features describing payers are considered. Appropriate choice of data, building the model and its application leads to the specified goal reaching, with better accuracy in comparison to "intuitive" choice. Simultaneously it becomes possible to extract decision rules in the linguistic form, what gives opportunity for easier interpretation of obtained results. As a result of the solution application the more accurate selection of tax payers is obtained. This is of significant meaning for the tax authorities, and this leads for the better observance of the tax law.
Rozwiązywanie zadań klasyfikacji dla rzeczywistych problemów stało się możliwe dzięki rozwojowi wydajniejszych systemów informatycznych. Dotyczy to również teorii zbiorów przybliżonych dla zadań klasyfikacji. W przedstawionej publikacji zastosowano zbiory przybliżone, które mają ugruntowaną teorię bazującą na rozszerzeniu teorii zbiorów i definiującą dolne oraz górne przybliżenie, oraz wyznaczającą tabelę decyzyjną do klasyfikacji. Metodę użyto do obliczeń istotności atrybutów oraz reguł decyzyjnych opisujących klasyfikację podatników ze względu na pozytywny lub negatywny wynik kontroli, przy uwzględnieniu specyficznych cech ich opisujących. Odpowiedni dobór danych, budowa modelu oraz jego użycie umożliwiło osiągnięcia zadanego celu ze zwiększoną dokładnością w stosunku do "intuicyjnego" wyboru. Wykorzystanie zbiorów przybliżonych, które wyznaczają wyniki końcowe klasyfikacji w postaci zbioru reguł umożliwiło ich ekstrakcję w łatwo interpretowalnej formie lingwistycznej. W publikacji zastosowano autorskie rozwiązanie programowe bazujące na kolekcjach, tablicach oraz obiektach pośrednich, zaimplementowane dla bazy danych Oracle, dzięki któremu zrealizowano zadanie oraz przedstawiono rezultaty. Dzięki uzyskanym wynikom bazującym na modelu opartym na użytej metodzie możliwe staje się dokładniejsze typowanie podatników funkcjonujących w polskim systemie prawnym i mających problemy podatkowe, których należy poddać kontroli. Tym samym zwiększa się skuteczność egzekwowania prawa podatkowego.
Źródło:: Pomiary Automatyka Kontrola; 2009, R. 55, nr 10, 10; 796-798
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Machine learning-based business rule engine data transformation over high-speed networks
Autorzy:: Neelima, Kenpi
Vasundra, S.
Powiązania:: https://bibliotekanauki.pl/articles/38700094.pdf
Data publikacji:: 2023
Wydawca:: Instytut Podstawowych Problemów Techniki PAN
Tematy:: CRISP-DM
data mining algorithms
business rule
prediction
classification
machine learning
deep learning
AI design
algorytmy eksploracji danych
reguła biznesowa
prognoza
klasyfikacja
nauczanie maszynowe
uczenie głębokie
projekt Sztucznej Inteligencji
Opis:: Raw data processing is a key business operation. Business-specific rules determine howthe raw data should be transformed into business-required formats. When source datacontinuously changes its formats and has keying errors and invalid data, then the effectiveness of the data transformation is a big challenge. The conventional data extraction andtransformation technique produces a delay in handling such data because of continuousfluctuations in data formats and requires continuous development of a business rule engine.The best business rule engines require near real-time detection of business rule and datatransformation mechanisms utilizing machine learning classification models. Since data iscombined from numerous sources and older systems, it is challenging to categorize andcluster the data and apply suitable business rules to turn raw data into the business-required format. This paper proposes a methodology for designing ensemble machine learning techniques and approaches for classifying and segmenting registered numbersof registered title records to choose the most suitable business rule that can convert theregistered number into the format the business expects, allowing businesses to provide customers with the most recent data in less time. This study evaluates the suggested modelby gathering sample data and analyzing classification machine learning (ML) models todetermine the relevant business rule. Experimentation employed Python, R, SQL storedprocedures, Impala scripts, and Datameer tools.
Źródło:: Computer Assisted Methods in Engineering and Science; 2023, 30, 1; 55-71
2299-3649
Pojawia się w:: Computer Assisted Methods in Engineering and Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "data-mining" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język