Temat: random forests - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: APPLICATION OF MIXED MODELS AND FAMILIES OF CLASSIFIERS TO ESTIMATION OF FINANCIAL RISK PARAMETERS
Autorzy:: Grzybowska, Urszula
Karwański, Marek
Powiązania:: https://bibliotekanauki.pl/articles/452746.pdf
Data publikacji:: 2015
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Katedra Ekonometrii i Statystyki
Tematy:: LGD
mixed models
random forests
gradient boosting
Opis:: The essential role in credit risk modeling is Loss Given Default (LGD) estimation. LGD is treated as a random variable with bimodal distribution. For LGD estimation advanced statistical models such as beta regression can be applied. Unfortunately, the parametric methods require amendments of the “inflation” type that lead to mixed modeling approach. Contrary to classical statistical methods based on probability distribution, the families of classifiers such as gradient boosting or random forests operate with information and allow for more flexible model adjustment. The problem encountered is comparison of obtained results. The aim of the paper is to present and compare results of LGD modeling using statistical methods and data mining approach. Calculations were done on real life data sourced from one of Polish large banks.
Źródło:: Metody Ilościowe w Badaniach Ekonomicznych; 2015, 16, 1; 108-115
2082-792X
Pojawia się w:: Metody Ilościowe w Badaniach Ekonomicznych
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Some Remarks on the Data Imputation Using “missForest” Method
Kilka uwag o imputacji danych z wykorzystaniem metody "missforest"
Autorzy:: Misztal, Małgorzata
Powiązania:: https://bibliotekanauki.pl/articles/905779.pdf
Data publikacji:: 2013
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: missing values
single and multiple imputation
random forests
missForest
Opis:: Missing data are quite common in practical applications of statistical methods and imputation is a general statistical method for the analysis of incomplete data sets. Stekhoven and Bühlmann (2012) proposed an iterative imputation method (called “missForest”) based on Random Forests (Breiman 2001) to cope with missing values. In the paper a short description of “missForest” is presented and some selected missing data techniques are compared with “missForest” by artificially simulating different proportions and mechanisms of missing data using complete data sets from the UCI repository of machine learning databases.
W pracy Stekhovena i Bühlmanna (2012) zaproponowano nową iteracyjną metodę imputacji (nazwaną „missForest”) opartą na metodzie Random Forests Breimana (2001). W niniejszym artykule omówiono metodę „missForest” i porównano kilka wybranych technik postępowania w sytuacji występowania braków danych z metodą „missForest”. W tym celu wykorzystano podejście symulacyjne generując różne proporcje i mechanizmy powstawania braków danych w zbiorach danych pochodzących głównie z repozytorium baz danych na Uniwersytecie Kalifornijskim w Irvine.
Źródło:: Acta Universitatis Lodziensis. Folia Oeconomica; 2013, 285
0208-6018
2353-7663
Pojawia się w:: Acta Universitatis Lodziensis. Folia Oeconomica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Indoor localization based on visible light communication and machine learning algorithms
Autorzy:: Ghonim, Alzahraa M.
Salama, Wessam M.
Khalaf, Ashraf A. M.
Shalaby, Hossam M. H.
Powiązania:: https://bibliotekanauki.pl/articles/2063908.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Stowarzyszenie Elektryków Polskich
Tematy:: free-space optical communication
visible light communication
neural networks
random forests
machine learning
Opis:: An indoor localization system is proposed based on visible light communications, received signal strength, and machine learning algorithms. To acquire an accurate localization system, first, a dataset is collected. The dataset is then used with various machine learning algorithms for training purpose. Several evaluation metrics are used to estimate the robustness of the proposed system. Specifically, authors’ evaluation parameters are based on training time, testing time, classification accuracy, area under curve, F1-score, precision, recall, logloss, and specificity. It turned out that the proposed system is featured with high accuracy. The authors are able to achieve 99.5% for area under curve, 99.4% for classification accuracy, precision, F1, and recall. The logloss and precision are 4% and 99.7%, respectively. Moreover, root mean square error is used as an additional performance evaluation averaged to 0.136 cm.
Źródło:: Opto-Electronics Review; 2022, 30, 2; art. no. e140858
1230-3402
Pojawia się w:: Opto-Electronics Review
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: FAMILIES OF CLASSIFIERS – APPLICATION IN DATA
Autorzy:: Grzybowska, Urszula
Karwański, Marek
Powiązania:: https://bibliotekanauki.pl/articles/453604.pdf
Data publikacji:: 2014
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Katedra Ekonometrii i Statystyki
Tematy:: random forests
gradient boosting
DEA
rating classes
variable selection
ranking
high rated portfolio
Opis:: Economic description of firms and companies is based on a number of indicators. The indicators are related to each other and can be considered only in a specific context. Regression models allow for such approach. Unfortunately, the problems we deal with are usually nonlinear and the choice of relevant information is very difficult. The aim of the paper is to present a method of variable selection based on random forest and gradient boosting approach and its application to companies ranking in DEA method. The results will be compared with the ordering obtained using expert supported approach for variable selection in DEA.
Źródło:: Metody Ilościowe w Badaniach Ekonomicznych; 2014, 15, 2; 94-101
2082-792X
Pojawia się w:: Metody Ilościowe w Badaniach Ekonomicznych
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Comparison of tree-based methods used in survival data
Autorzy:: Yabaci, Aysegul
Sigirli, Deniz
Powiązania:: https://bibliotekanauki.pl/articles/2034119.pdf
Data publikacji:: 2022-03-15
Wydawca:: Główny Urząd Statystyczny
Tematy:: tree-based methods
conditional inference trees
conditional inference forests
random survival forests
Opis:: Survival trees and forests are popular non-parametric alternatives to parametric and semiparametric survival models. Conditional inference trees (Ctree) form a non-parametric class of regression trees embedding tree-structured regression models into a well-defined theory of conditional inference procedures. The Ctree is applicable in a varietyof regression-related issues, involving nominal, ordinal, numeric, censored, as well as multivariate response variables and arbitrary measurement scales of covariates. Conditional inference forests (Cforest) consitute a survival forest method which combines a large number of Ctrees. The Cforest provides a unified and flexible framework for ensemble learning in the presence of censoring. The random survival forests (RSF) methodology extends the random forests method enabling the approximation of rich classes of functions while maintaining generalisation errors low. In the present study, the Ctree, Cforest and RSF methods are discussed in detail and the performances of the survival forest methods, namely the Cforest and RSF have been compared with a simulation study. The results of the simulation demonstrate that the RSF method with a log-rank score distinction criteria outperforms the Cforest and the RSF with log-rank distinction criteria.
Źródło:: Statistics in Transition new series; 2022, 23, 1; 21-38
1234-7655
Pojawia się w:: Statistics in Transition new series
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Application of selected supervised classification methods to bank marketing campaign
Autorzy:: Grzonka, D.
Borowik, B.
Suchacka, G.
Powiązania:: https://bibliotekanauki.pl/articles/94739.pdf
Data publikacji:: 2016
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie
Tematy:: classification
supervised learning
data mining
decision trees
bagging
boosting
random forests
bank marketing
R project
Opis:: Supervised classification covers a number of data mining methods based on training data. These methods have been successfully applied to solve multi-criteria complex classification problems in many domains, including economical issues. In this paper we discuss features of some supervised classification methods based on decision trees and apply them to the direct marketing campaigns data of a Portuguese banking institution. We discuss and compare the following classification methods: decision trees, bagging, boosting, and random forests. A classification problem in our approach is defined in a scenario where a bank’s clients make decisions about the activation of their deposits. The obtained results are used for evaluating the effectiveness of the classification rules.
Źródło:: Information Systems in Management; 2016, 5, 1; 36-48
2084-5537
2544-1728
Pojawia się w:: Information Systems in Management
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Evaluation of resampling methods in the class unbalance problem
Ocena metod repróbkowania w problemie zbiorów niezbilansowanych
Autorzy:: Kubus, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/424935.pdf
Data publikacji:: 2020
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: class unbalance
resampling
regularized logistic regression
random forests
klasy niezbilansowane
repróbkowanie
regularyzowana regresja logistyczna
lasy losowe
Opis:: The purpose of many real world applications is the prediction of rare events, and the training sets are then highly unbalanced. In this case, the classifiers are biased towards the correct prediction of the majority class and they misclassify a minority class, whereas rare events are of the greater interest. To handle this problem, numerous techniques were proposed that balance the data or modify the learning algorithms. The goal of this paper is a comparison of simple random balancing methods with more sophisticated resampling methods that appeared in the literature and are available in R program. Additionally, the authors ask whether learning on the original dataset and using a shifted threshold for classification is not more competitive. The authors provide a survey from the perspective of regularized logistic regression and random forests. The results show that combining random under-sampling with random forests has an advantage over other techniques while logistic regression can be competitive in the case of highly unbalanced data.
Celem wielu praktycznych zastosowań modeli dyskryminacyjnych jest przewidywanie zdarzeń rzadkich. Zbiory uczące są wówczas niezbilansowane. W tym przypadku klasyfikatory mają tendencję do poprawnego klasyfikowania obiektów klasy większościowej i jednocześnie błędnie klasyfikują wiele obiektów klasy mniejszościowej, która jest przedmiotem szczególnego zainteresowania. W celu rozwiązania tego problemu zaproponowano wiele technik, które bilansują dane lub modyfikują algorytmy uczące. Celem artykułu jest porównanie prostych, losowych metod bilansowania z bardziej wyrafinowanymi, które pojawiły się w literaturze. Dodatkowo postawiono pytanie, czy konkurencyjnym podejściem nie jest budowa modelu na oryginalnym zbiorze danych i przesunięcie progu klasyfikacji. Badanie przedstawiono z perspektywy regularyzowanej regresji logistycznej i lasów losowych. Wyniki pokazują, że kombinacja metody under-sampling z lasami losowymi wykazuje przewagę nad innymi technikami, podczas gdy regresja logistyczna może być konkurencyjna w przypadku silnego niezbilansowania.
Źródło:: Econometrics. Ekonometria. Advances in Applied Data Analytics; 2020, 24, 1; 39-50
1507-3866
Pojawia się w:: Econometrics. Ekonometria. Advances in Applied Data Analytics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Automatic speech based emotion recognition using paralinguistics features
Autorzy:: Hook, J.
Noroozi, F.
Toygar, O.
Anbarjafari, G.
Powiązania:: https://bibliotekanauki.pl/articles/200261.pdf
Data publikacji:: 2019
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: random forests
speech emotion recognition
machine learning
support vector machines
lasy
rozpoznawanie emocji mowy
nauczanie maszynowe
Opis:: Affective computing studies and develops systems capable of detecting humans affects. The search for universal well-performing features for speech-based emotion recognition is ongoing. In this paper, a?small set of features with support vector machines as the classifier is evaluated on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. It is shown that a?set of 87 features can offer results on-par with state-of-the-art, yielding 80.21, 88.6, 75.42 and 93.41% average emotion recognition rate, respectively. In addition, an experiment is conducted to explore the significance of gender in emotion recognition using random forests. Two models, trained on the first and second database, respectively, and four speakers were used to determine the effects. It is seen that the feature set used in this work performs well for both male and female speakers, yielding approximately 27% average emotion recognition in both models. In addition, the emotions for female speakers were recognized 18% of the time in the first model and 29% in the second. A?similar effect is seen with male speakers: the first model yields 36%, the second 28% a?verage emotion recognition rate. This illustrates the relationship between the constitution of training data and emotion recognition accuracy.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2019, 67, 3; 479-488
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: The Problem of Redundant Variables in Random Forests
Problem zmiennych redundantnych w metodzie lasów losowych
Autorzy:: Kubus, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/656761.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: lasy losowe
zmienne redundantne
dobór zmiennych
taksonomia cech
random forests
redundant variables
feature selection
clustering of features
Opis:: Lasy losowe są obecnie jedną z najchętniej stosowanych przez praktyków metod klasyfikacji wzorcowej. Na jej popularność wpływ ma możliwość jej stosowania bez czasochłonnego, wstępnego przygotowywania danych do analizy. Las losowy można stosować dla różnego typu zmiennych, niezależnie od ich rozkładów. Metoda ta jest odporna na obserwacje nietypowe oraz ma wbudowany mechanizm doboru zmiennych. Można jednak zauważyć spadek dokładności klasyfikacji w przypadku występowania zmiennych redundantnych. W artykule omawiane są dwa podejścia do problemu zmiennych redundantnych. Rozważane są dwa sposoby przeszukiwania w podejściu polegającym na doborze zmiennych oraz dwa sposoby konstruowania zmiennych syntetycznych w podejściu wykorzystującym grupowanie zmiennych. W eksperymencie generowane są liniowo zależne predyktory i włączane do zbiorów danych rzeczywistych. Metody redukcji wymiarowości zwykle poprawiają dokładność lasów losowych, ale żadna z nich nie wykazuje wyraźnej przewagi.
Random forests are currently one of the most preferable methods of supervised learning among practitioners. Their popularity is influenced by the possibility of applying this method without a time consuming pre‑processing step. Random forests can be used for mixed types of features, irrespectively of their distributions. The method is robust to outliers, and feature selection is built into the learning algorithm. However, a decrease of classification accuracy can be observed in the presence of redundant variables. In this paper, we discuss two approaches to the problem of redundant variables. We consider two strategies of searching for best feature subset as well as two formulas of aggregating the features in the clusters. In the empirical experiment, we generate collinear predictors and include them in the real datasets. Dimensionality reduction methods usually improve the accuracy of random forests, but none of them clearly outperforms the others.
Źródło:: Acta Universitatis Lodziensis. Folia Oeconomica; 2018, 6, 339; 7-16
0208-6018
2353-7663
Pojawia się w:: Acta Universitatis Lodziensis. Folia Oeconomica
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: ZASTOSOWANIE ANALIZY SKUPIEŃ I LASÓW LOSOWYCH W KLASYFIKACJI GMIN W POLSCE NA SKALI POZIOMU ROZWOJU SPOŁECZNO-GOSPODARCZEGO
USING CLUSTER ANALYSIS AND TECHNIQUE OF RANDOM FORESTS IN THE CLASSIFICATION OF COMMUNES IN POLAND ON THE SCALE OF SOCIO-ECONOMIC DEVELOPMENT
Autorzy:: Perdał, Robert
Powiązania:: https://bibliotekanauki.pl/articles/452997.pdf
Data publikacji:: 2018
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Katedra Ekonometrii i Statystyki
Tematy:: analiza skupień
lasy losowe
klasyfikacja
gminy
rozwój społeczno-gospodarczy
cluster analysis
random forests
classification
communes
socio-economic development
Opis:: W artykule przedstawiono algorytm klasyfikacji gmin na skali poziomu rozwoju społeczno-gospodarczego. Algorytm ten obejmuje cztery etapy: (1) dobór i redukcja zmiennych, (2) konstrukcja miernika syntetycznego i uszeregowanie liniowe gmin na skali poziomu rozwoju społeczno-gospodarczego, (3) grupowanie gmin metodą analizy skupień wg algorytmu k-średnich na podstawie wartości miernika syntetycznego, (4) weryfikacja klasyfikacji metodą lasów losowych. W wyniku procedury klasyfikacyjnej zidentyfikowano dywergencję rozwoju społeczno-gospodar¬czego w Polsce.
"The article presents the algorithm of classification of communes on the scale of socio-economic development level. The algorithm includes four steps: (1) selection and reduction of variables, (2) construction of a synthetic measure and linear ordering of communes on the scale of socio-economic development level, (3) grouping of communes by cluster analysis (k-means algorithm) based on the synthetic measure, (4) verification of classification using the random forests method. As a result of the classification procedure was identified the progressive divergence of socio-economic development in Poland."
Źródło:: Metody Ilościowe w Badaniach Ekonomicznych; 2018, 19, 3; 263-273
2082-792X
Pojawia się w:: Metody Ilościowe w Badaniach Ekonomicznych
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Classifiers accuracy improvement based on missing data imputation
Autorzy:: Jordanov, I.
Petrov, N.
Petrozziello, A.
Powiązania:: https://bibliotekanauki.pl/articles/91626.pdf
Data publikacji:: 2018
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: machine learning
missing data
model-based imputation
neural networks
random forests
support vector machine
radar signal classification
nauczanie maszynowe
brakujące dane
sieci neuronowe
maszyna wektorów nośnych
klasyfikacja sygnałów radarowych
Opis:: In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2018, 8, 1; 31-48
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Identyfikacja i ocena zmienności cen drewna w nadleśnictwie Płock
Identification and evaluation of wood price variability in Płock Forest District
Autorzy:: Suchodolski, Przemysław
Idzik, Marcin
Powiązania:: https://bibliotekanauki.pl/articles/543027.pdf
Data publikacji:: 2018-11-28
Wydawca:: Główny Urząd Statystyczny
Tematy:: Lasy Państwowe
trend
cykliczność
wahania sezonowe
wahania przypadkowe
wieloletni plan urządzania lasu
State Forests
cyclicality
seasonal fluctuations
random fluctuations
multi-annual management plan of forests
Opis:: Głównym celem opracowania jest ocena zmienności cen wybranych sortymentów drewna w nadleśnictwie Płock. Przeanalizowano ceny ośmiu rodzajów drewna zgodnie z zasadami dekompozycji szeregów czasowych przy użyciu metody CENSUS II X-11. Za pomocą analizy widmowej Fouriera dokonano także oceny długości trwania cykli kształtowania się cen. Dane pochodziły z nadleśnictwa Płock i obejmowały lata 2004—2014 w układzie miesięcznym. Na podstawie badania stwierdzono, że ceny drewna w nadleśnictwie Płock cechują się wyraźną zmiennością o charakterze systematycznym, co oznacza, że można wyodrębnić trend i cykliczność. Wyniki przeprowadzonych obliczeń ukazały również istotną skalę sezonowych i przypadkowych wahań cen drewna. Analiza cen drewna wykazała trend rosnący dla wszystkich sortymentów, natomiast dynamika wahań sezonowych różniła się w zależności od sortymentu. Stwierdzono znaczne natężenie wahań przypadkowych, które odznaczały się wysoką amplitudą odchyleń.
The main aim of the research is to evaluate the variability of prices of selected wood assortments in the Płock Forest District. Prices of eight wood types were analysed according to the rules of time series decomposition using the CENSUS II X-11 method. The cycles length was also evaluated by means of Fourier spectral analysis. Data were obtained from the Płock Forest District and covered the years 2004—2014 on a monthly basis. On the basis of the conducted study, it was found that wood prices in the Płock Forest District are characterised by a clear share systematic variability which means that a trend and cyclicality can be distinguished. The results of this research have also shown considerable scale of seasonal and accidental fluctuations in wood prices. The analysis of wood prices showed a growing trend for all assortments, while the dynamics of seasonal fluctuations differed depending on the assortment. Significant intensity of random fluctuations was found, which were characterised by high amplitude of deviations.
Źródło:: Wiadomości Statystyczne. The Polish Statistician; 2018, 63, 11; 41-55
0043-518X
Pojawia się w:: Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "random forests" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język