Temat: imbalanced data - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Analiza danych niezrównoważonych we wstępnej diagnostyce raka pęcherza moczowego
Analysis of imbalanced data using morphometric parameters in diagnosis of bladder cancer
Autorzy:: Piotrowska, E.
Stanisławski, W.
Powiązania:: https://bibliotekanauki.pl/articles/157289.pdf
Data publikacji:: 2012
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: dane niezrównoważone
uczenie nadzorowane
imbalanced data
supervised learning
Opis:: Artykuł przedstawia wyniki rozważań dotyczących klasyfikacji danych niezrównoważonych w obrazach mikroskopowych preparatów cytologicznych. Do klasyfikacji wykorzystano algorytmy uczenia nadzorowanego jak: naiwny klasyfikator Bayesa, analiza dyskryminacyjna, drzewa decyzyjne oraz zaproponowany przez autorów algorytm klasyfikacji będący połączeniem zbiorów przybliżonych i metody k-najbliższych sąsiadów. Do analizy wykorzystano opracowane przez autorów narzędzie Rough Sets Analysis Toolbox (RSA Toolbox) - przybornik dla środowiska MATLAB. Wykorzystane obrazy mikroskopowe uzyskano w procesie diagnostyki nowotworu pęcherza moczowego badając metodą FISH odpowiednio przygotowane preparaty moczu.
In the paper the results of imbalanced data classification based on microscope images are described. The images were acquired in the process of bladder cancer diagnosis using the FISH method. The conducted research were focused on the effectiveness of the initial cancer diagnosis using specimen radiation in a DAPI channel and supervised learning methods. The analyzed data set contains about 23,000 objects described by 212 morphometric features. Each object was classified to one of two classes: normal cells or cancers cells. Decisions about belonging objects to the corresponding classes were carried out by an expert. There were identified only 640 cancer cells in the analyzed data. Most of learning algorithms assume balance between classes. The class imbalance problem causes difficulties at a learning stage and reduces the predictive ability. Therefore, the classifier evaluation was performed using G-mean and F-value measures. The authors defined additional measure FMaxSen=sen2ospe which is the product of sensitivity and specificity coefficients. Use of the second power factor emphasizes the importance of sensitivity and allows searching the classifier with the maximum specificity at the maximum sensitivity. The analysis presented in the paper was performed with use of Rough Sets Analysis Toolbox (RSA Toolbox) for MATLAB implemented by the authors. The main part of the RSA Toolbox contains a module which supports the rough sets theory processing. Another part (RSAm module) is a wrapper for the proposed rough classification functions and others implemented in Matalab such as NaiveBayes, Discriminant Analysis, Decision Tree. The RSAm gives us possibility to use cross validation for measuring the classification accuracy. The RSAm also contains features reduction algorithms (correlation based feature selection, sequential feature selection, principal component analysis) as well as discretizations algorithms (EWD, CAIM, CACC). An important part of the RSAToolbox is implementation of distributed computations using Matlab Parallel Computing Toolbox and Distributed Computing Server.
Źródło:: Pomiary Automatyka Kontrola; 2012, R. 58, nr 8, 8; 737-740
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Fast attack detection method for imbalanced data in industrial cyber-physical systems
Autorzy:: Huang, Meng
Li, Tao
Li, Beibei
Zhang, Nian
Huang, Hanyuan
Powiązania:: https://bibliotekanauki.pl/articles/23944834.pdf
Data publikacji:: 2023
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: intrusion detection system
industrial cyber-physical Systems
imbalanced data
all k-nearest neighbor
LightGBM
Opis:: Integrating industrial cyber-physical systems (ICPSs) with modern information technologies (5G, artificial intelligence, and big data analytics) has led to the development of industrial intelligence. Still, it has increased the vulnerability of such systems regarding cybersecurity. Traditional network intrusion detection methods for ICPSs are limited in identifying minority attack categories and suffer from high time complexity. To address these issues, this paper proposes a network intrusion detection scheme, which includes an information-theoretic hybrid feature selection method to reduce data dimensionality and the ALLKNN-LightGBM intrusion detection framework. Experimental results on three industrial datasets demonstrate that the proposed method outperforms four mainstream machine learning methods and other advanced intrusion detection techniques regarding accuracy, F-score, and run time complexity.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2023, 13, 4; 229--245
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: CCR: A combined cleaning and resampling algorithm for imbalanced data classification
Autorzy:: Koziarski, M.
Woźniak, M.
Powiązania:: https://bibliotekanauki.pl/articles/329869.pdf
Data publikacji:: 2017
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: machine learning
classification algorithms
imbalanced data
preprocessing
oversampling
uczenie maszynowe
algorytm klasyfikacji
dane niezrównoważone
wstępne przetwarzanie danych
Opis:: Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 727-736
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Using information on class interrelations to improve classification of multiclass imbalanced data: A new resampling algorithm
Autorzy:: Janicka, Małgorzata
Lango, Mateusz
Stefanowski, Jerzy
Powiązania:: https://bibliotekanauki.pl/articles/330287.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: imbalanced data
multi-class learning
re-sampling
data difficulty factor
similarity degree
dane niezrównoważone
ponowne próbkowanie
stopień podobieństwa
Opis:: The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better than specialized preprocessing techniques for multi-imbalanced problems.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 4; 769-781
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "imbalanced data" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język