Temat: random selection - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: The Problem of Redundant Variables in Random Forests
Problem zmiennych redundantnych w metodzie lasów losowych
Autorzy:: Kubus, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/656761.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: lasy losowe
zmienne redundantne
dobór zmiennych
taksonomia cech
random forests
redundant variables
feature selection
clustering of features
Opis:: Lasy losowe są obecnie jedną z najchętniej stosowanych przez praktyków metod klasyfikacji wzorcowej. Na jej popularność wpływ ma możliwość jej stosowania bez czasochłonnego, wstępnego przygotowywania danych do analizy. Las losowy można stosować dla różnego typu zmiennych, niezależnie od ich rozkładów. Metoda ta jest odporna na obserwacje nietypowe oraz ma wbudowany mechanizm doboru zmiennych. Można jednak zauważyć spadek dokładności klasyfikacji w przypadku występowania zmiennych redundantnych. W artykule omawiane są dwa podejścia do problemu zmiennych redundantnych. Rozważane są dwa sposoby przeszukiwania w podejściu polegającym na doborze zmiennych oraz dwa sposoby konstruowania zmiennych syntetycznych w podejściu wykorzystującym grupowanie zmiennych. W eksperymencie generowane są liniowo zależne predyktory i włączane do zbiorów danych rzeczywistych. Metody redukcji wymiarowości zwykle poprawiają dokładność lasów losowych, ale żadna z nich nie wykazuje wyraźnej przewagi.
Random forests are currently one of the most preferable methods of supervised learning among practitioners. Their popularity is influenced by the possibility of applying this method without a time consuming pre‑processing step. Random forests can be used for mixed types of features, irrespectively of their distributions. The method is robust to outliers, and feature selection is built into the learning algorithm. However, a decrease of classification accuracy can be observed in the presence of redundant variables. In this paper, we discuss two approaches to the problem of redundant variables. We consider two strategies of searching for best feature subset as well as two formulas of aggregating the features in the clusters. In the empirical experiment, we generate collinear predictors and include them in the real datasets. Dimensionality reduction methods usually improve the accuracy of random forests, but none of them clearly outperforms the others.
Źródło:: Acta Universitatis Lodziensis. Folia Oeconomica; 2018, 6, 339; 7-16
0208-6018
2353-7663
Pojawia się w:: Acta Universitatis Lodziensis. Folia Oeconomica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Data mining methods for prediction of air pollution
Autorzy:: Siwek, K.
Osowski, S.
Powiązania:: https://bibliotekanauki.pl/articles/330775.pdf
Data publikacji:: 2016
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: computational intelligence
feature selection
neural network
random forest
air pollution forecasting
inteligencja obliczeniowa
selekcja cech
sieć neuronowa
lasy losowe
zanieczyszczenie powietrza
Opis:: The paper discusses methods of data mining for prediction of air pollution. Two tasks in such a problem are important: generation and selection of the prognostic features, and the final prognostic system of the pollution for the next day. An advanced set of features, created on the basis of the atmospheric parameters, is proposed. This set is subject to analysis and selection of the most important features from the prediction point of view. Two methods of feature selection are compared. One applies a genetic algorithm (a global approach), and the other—a linear method of stepwise fit (a locally optimized approach). On the basis of such analysis, two sets of the most predictive features are selected. These sets take part in prediction of the atmospheric pollutants PM10, SO2, NO2 and O3. Two approaches to prediction are compared. In the first one, the features selected are directly applied to the random forest (RF), which forms an ensemble of decision trees. In the second case, intermediate predictors built on the basis of neural networks (the multilayer perceptron, the radial basis function and the support vector machine) are used. They create an ensemble integrated into the final prognosis. The paper shows that preselection of the most important features, cooperating with an ensemble of predictors, allows increasing the forecasting accuracy of atmospheric pollution in a significant way.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2016, 26, 2; 467-478
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "random selection" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język