Temat: dobór zmiennych - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: The Problem of Redundant Variables in Random Forests
Problem zmiennych redundantnych w metodzie lasów losowych
Autorzy:: Kubus, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/656761.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: lasy losowe
zmienne redundantne
dobór zmiennych
taksonomia cech
random forests
redundant variables
feature selection
clustering of features
Opis:: Lasy losowe są obecnie jedną z najchętniej stosowanych przez praktyków metod klasyfikacji wzorcowej. Na jej popularność wpływ ma możliwość jej stosowania bez czasochłonnego, wstępnego przygotowywania danych do analizy. Las losowy można stosować dla różnego typu zmiennych, niezależnie od ich rozkładów. Metoda ta jest odporna na obserwacje nietypowe oraz ma wbudowany mechanizm doboru zmiennych. Można jednak zauważyć spadek dokładności klasyfikacji w przypadku występowania zmiennych redundantnych. W artykule omawiane są dwa podejścia do problemu zmiennych redundantnych. Rozważane są dwa sposoby przeszukiwania w podejściu polegającym na doborze zmiennych oraz dwa sposoby konstruowania zmiennych syntetycznych w podejściu wykorzystującym grupowanie zmiennych. W eksperymencie generowane są liniowo zależne predyktory i włączane do zbiorów danych rzeczywistych. Metody redukcji wymiarowości zwykle poprawiają dokładność lasów losowych, ale żadna z nich nie wykazuje wyraźnej przewagi.
Random forests are currently one of the most preferable methods of supervised learning among practitioners. Their popularity is influenced by the possibility of applying this method without a time consuming pre‑processing step. Random forests can be used for mixed types of features, irrespectively of their distributions. The method is robust to outliers, and feature selection is built into the learning algorithm. However, a decrease of classification accuracy can be observed in the presence of redundant variables. In this paper, we discuss two approaches to the problem of redundant variables. We consider two strategies of searching for best feature subset as well as two formulas of aggregating the features in the clusters. In the empirical experiment, we generate collinear predictors and include them in the real datasets. Dimensionality reduction methods usually improve the accuracy of random forests, but none of them clearly outperforms the others.
Źródło:: Acta Universitatis Lodziensis. Folia Oeconomica; 2018, 6, 339; 7-16
0208-6018
2353-7663
Pojawia się w:: Acta Universitatis Lodziensis. Folia Oeconomica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Crash data reporting systems in Fourteen Arab countries: challenges and improvement
Autorzy:: Abounoas, Zahira
Raphael, Wassim
Badr, Yarob
Faddoul, Rafic
Guillaume, Anne
Powiązania:: https://bibliotekanauki.pl/articles/1833641.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: road accidents
road safety
information system
reporting system
variables selection
classification model
wypadek drogowy
bezpieczeństwo na drogach
system informacyjny
systemy raportowania
dobór zmiennych
model klasyfikacji
Opis:: Traffic crash fatalities and serious injuries still represent a big burden for most Arab countries because the actual policies, strategies, and interventions are based on poorly collected data. Through this paper, we assessed the crash data reporting systems in Fourteen Arab countries via a survey conducted to identify the fundamental dysfunctions at the management and data collection levels. Then, to address some of the dataset problems, we had applied data mining technics to select a minimum of variables (crash, vehicle, and road user) that should be collected for a better understanding of crash circumstances. For this raison, three methods of selection (correlation, information gain, and gain ratio) and seven classifiers (naive Bayes, nearest neighbour, random forest, random tree, J48, reduced error pruning tree, and bagging) were tested and compared to identify the variables that affect significantly the crashes severity. Decision trees family of classifiers showed the best performance based on the analysis of the area under the curve. The explanatory variables obtained from the data mining process were combined with other descriptive variables to maintain traceability. As a result, we produced hybrid lists of variables for the crash, vehicle, and road user, each contains 25 variables. Finally, in order to propose a cost-effective solution to switch from manual to electronic data collection, we got inspired by a tool used to track animals to create and customize a unified e-form for handheld devices, in order to ensure easy entering of the harmonized data for the entire region based on our selected lists of variables. The tool verified the countries requirements especially by enabling data collection and transfer with and without the internet, and by allowing data analysis thought its built-in Geographic Information System (GIS) capabilities.
Źródło:: Archives of Transport; 2020, 56, 4; 73-88
0866-9546
2300-8830
Pojawia się w:: Archives of Transport
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "dobór zmiennych" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język