Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "Random Forest" wg kryterium: Temat


Tytuł:
Impacts of forest spatial structure on variation of the multipath phenomenon of navigation satellite signals
Autorzy:
Brach, Michał
Stereńczak, Krzysztof
Bolibok, Leszek
Kwaśny, Łukasz
Krok, Grzegorz
Laszkowski, Michał
Powiązania:
https://bibliotekanauki.pl/articles/2044153.pdf
Data publikacji:
2019
Wydawca:
Instytut Badawczy Leśnictwa
Tematy:
GNSS
multipath
random forest
Borut
forest structure
LiDAR
Opis:
The GNSS (Global Navigation Satellite System) receivers are commonly used in forest management in order to determine objects coordinates, area or length assessment and many other tasks which need accurate positioning. Unfortunately, the forest structure strongly limits access to satellite signals, which makes the positioning accuracy much weak comparing to the open areas. The main reason for this issue is the multipath phenomenon of satellite signal. It causes radio waves reflections from surrounding obstacles so the signal do not reach directly to the GNSS receiver’s antenna. Around 50% of error in GNSS positioning in the forest is because of multipath effect. In this research study, an attempt was made to quantify the forest stand features that may influence the multipath variability. The ground truth data was collected in six Forest Districts located in different part of Poland. The total amount of data was processed for over 2,700 study inventory plots with performed GNSS measurements. On every plot over 25 forest metrics were calculated and over 25 minutes of raw GNSS observations (1500 epochs) were captured. The main goal of this study was to find the way of multipath quantification and search the relationship between multipath variability and forest structure. It was reported that forest stand merchantable volume is the most important factor which influence the multipath phenomenon. Even though the similar geodetic class GNSS receivers were used it was observed significant difference of multipath values in similar conditions.
Źródło:
Folia Forestalia Polonica. Series A . Forestry; 2019, 61, 1; 3-21
0071-6677
Pojawia się w:
Folia Forestalia Polonica. Series A . Forestry
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Assessing the efficiency of a random forest regression model for estimating water quality indicators
Autorzy:
Zavareh, Maryam
Maggioni, Viviana
Zhang, Xinxuan
Powiązania:
https://bibliotekanauki.pl/articles/27810498.pdf
Data publikacji:
2023
Wydawca:
Instytut Meteorologii i Gospodarki Wodnej - Państwowy Instytut Badawczy
Tematy:
Random Forest
water quality
hydrometeorological information
Opis:
This work evaluates the efficiency of Random Forest (RF) regression for predicting water quality indicators and investigates factors affecting water quality in 11 watersheds in Virginia, District of Columbia, and Maryland. Ten years of daily water quality data along with hydro-meteorological information (such as precipitation) and watershed physiology and characteristics (e.g., size, soil type, land use) are used to predict dissolved oxygen (DO), specific conductivity (K), and turbidity (Tu) across the selected watersheds. The RF regression model is developed for six scenarios, with an increasing number of predictors introduced in each scenario. The first scenario contains the smallest amount of information (water quality indicators DO, K and Tu), while scenario 6 contains all the available variables. The RF model is evaluated based on three statistical metrics: the relative root mean square error, the correlation coefficient, and the percentage of variance explained. In addition, the degree of importance for each predictor is used to rank their importance within each scenario. The model shows excellent performance for DO as the predicted variable. The model predicting K slightly outperforms the one predicting Tu. Scenario 4 (built based on water quality indicators, hydro-meteorological data, watershed physiology and land cover information) provided the best tradeoff between performance and efficiency (quantified in terms of the amount of information needed to develop the model). In conclusion, based on the RF model, land cover plays a significant role in predicting water quality indicators. In addition, the developed RF regression model is adaptable to watersheds in this region over a range of climates.
Źródło:
Meteorology Hydrology and Water Management. Research and Operational Applications; 2023, 11, 2; 1--18
2299-3835
2353-5652
Pojawia się w:
Meteorology Hydrology and Water Management. Research and Operational Applications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Predicting immunogenicity in murine hosts with use of Random Forest classifier
Przewidywanie immunogenności u myszy przy użyciu klasyfikatora Random Forest
Autorzy:
Marciniak, Anna
Tarczewska, Martyna
Kloska, Sylwester
Powiązania:
https://bibliotekanauki.pl/articles/2016293.pdf
Data publikacji:
2020
Wydawca:
Politechnika Bydgoska im. Jana i Jędrzeja Śniadeckich. Wydawnictwo PB
Tematy:
Random Forest Classifier
immunogenicity
machine learning
entropy
Gini index
klasyfikator Random Forest
immunogenność
uczenie maszynowe
entropia
Opis:
Biomedical data are difficult to interpret due to their large amount. One of the solutions to cope with this problem is to use machine learning. Machine learning can be used to capture previously unnoticed dependencies. The authors performed random forest classifier with entropy and Gini index criteria on immunogenicity data. Input data consisted of 3 columns: epitope (8-11 amino acids long peptide), major histocompatibility complex (MHC) and immune response. Presented model can predict the immune response based on epitope-MHC complex. Achieved results had accuracy of 84% for entropy and 83% for Gini index. The results are not fully satisfying but are a fair start for more complexed experiments and could be used as an indicator for further research.
Dane biomedyczne są trudne do interpretacji ze względu na ich dużą ilość. Jednym z rozwiązań radzenia sobie z tym problemem jest wykorzystanie uczenia maszynowego. Techniki te umożliwiają wychwycenie wcześniej niezauważonych zależności. W artykule przedstawiono wykorzystanie klasyfikatora Random Forest z kryterium entropii i indeksem Gini na danych dotyczących immunogenności. Dane wejściowe składają się z 3 kolumn: epitop (peptyd o długości 8-11 aminokwasów), główny kompleks zgodności tkankowej (MHC) i odpowiedź immunologiczna. Zaprezentowany model przewiduje odpowiedź immunologiczną na podstawie kompleksu epitop-MHC. Uzyskane wyniki osiągnęły dokładność na poziomie 84% (entropia) i 83% (indeks Gini). Wyniki nie są w pełni satysfakcjonujące, ale stanowią dobry początek dla bardziej złożonych eksperymentów i wyznacznik do dalszych badań.
Źródło:
Zeszyty Naukowe. Telekomunikacja i Elektronika / Uniwersytet Technologiczno-Przyrodniczy w Bydgoszczy; 2020, 24; 31-43
1899-0088
Pojawia się w:
Zeszyty Naukowe. Telekomunikacja i Elektronika / Uniwersytet Technologiczno-Przyrodniczy w Bydgoszczy
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Classification of Seizure Types Using Random Forest Classifier
Autorzy:
Basri, Ashjan
Arif, Muhammad
Powiązania:
https://bibliotekanauki.pl/articles/2123290.pdf
Data publikacji:
2021
Wydawca:
Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:
EEG
fast fourier transform
seizure
random forest
Opis:
Epilepsy is one of the most common mental disorders in the world, affecting 65 million people. The prevalence in Arab countries of Epilepsy is estimated at 174 per 100,000 individuals, and in Saudi Arabia is 6.54 per 1,000 individuals. Epilepsy seizures have different types, and each patient needs to have a treatment plan according to the seizure type. Hence, accurate classification of seizure type is an essential part of diagnosing and treating epileptic patients. In this paper, features based on fast Fourier transform from EEG montages are used to classify different types of seizures. Since the distribution of classes is not uniform and the dataset suffers from severe imbalance. Various algorithms are used to under-sample the majority class and over-sample the minority classes. Random forest classifier produced classification accuracy of 96% to differentiate three types of seizures from the healthy EEG reading.
Źródło:
Advances in Science and Technology. Research Journal; 2021, 15, 3; 167--178
2299-8624
Pojawia się w:
Advances in Science and Technology. Research Journal
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A novel drift detection algorithm based on features’ importance analysis in a data streams environment
Autorzy:
Duda, Piotr
Przybyszewski, Krzysztof
Wang, Lipo
Powiązania:
https://bibliotekanauki.pl/articles/1837417.pdf
Data publikacji:
2020
Wydawca:
Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:
data stream mining
random forest
features importance
Opis:
The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data.
Źródło:
Journal of Artificial Intelligence and Soft Computing Research; 2020, 10, 4; 287-298
2083-2567
2449-6499
Pojawia się w:
Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Application of the Random Forest Model to Predict the Plasticity State of Vertisols
Autorzy:
Al Masmoudi, Yassine
Bouslihim, Yassine
Doumali, Kaoutar
El Aissaoui, Abdellah
Namr, Khalid Ibno
Powiązania:
https://bibliotekanauki.pl/articles/1839081.pdf
Data publikacji:
2021
Wydawca:
Polskie Towarzystwo Inżynierii Ekologicznej
Tematy:
soil plasticity
random forest
moroccan vertisol
soil degradation
Opis:
Vertisol plasticity is related to moisture content, and it requires an in-depth physicochemical characterization. This information allows us to use the land under the most adequate conditions and avoid soil physical degradation, especially its compaction. The objective of this study was to characterize the Vertisol in the Moroccan region of Doukkala-Abda and to predict soil plasticity based on the physicochemical parameters of soil, such as texture, electrical conductivity, Soil Organic Matter (SOM) and other chemical parameters for 120 samples. Determination of soil plasticity using Atterberg limits is a challenging and time-consuming method. Thus, this study aimed to develop a new model that can predict soil plasticity using the Random Forest algorithm. The soils presented homogeneity in the majority of physicochemical parameters, except a significant difference observed in the SOM and the electrical conductivity, which in turn influenced the soil plasticity state. The results showed significant and positive correlations between SOM, Soil Clay Content (SCC), Electrical Conductivity (EC), and plasticity in the Vertisol fields of the region. For the training phase, the model gave excellent results with a coefficient of determination of 0.995 and an RMSE of 0.164. Almost the same results were observed in the validation phase with a coefficient of determination of 0.974 and an RMSE of 0.361, which shows that the model succeeded in predicting plasticity in both phases. On the basis of these results, this model can be used for the plasticity prediction using other physicochemical parameters and the Random Forest Model. The prediction of soil plasticity is an important parameter to respect the timing of introducing machines/tools in the fields and avoid Vertisol degradation.
Źródło:
Journal of Ecological Engineering; 2021, 22, 2; 36-46
2299-8993
Pojawia się w:
Journal of Ecological Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
EQUITY ISSUANCE AND CORPORATE DIVIDEND POLICY IN EMERGING ECONOMY CONTEXT
Autorzy:
Rohov, Heorhiy
Solesvik, Marina Z.
Powiązania:
https://bibliotekanauki.pl/articles/453403.pdf
Data publikacji:
2016
Wydawca:
Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Katedra Ekonometrii i Statystyki
Tematy:
dividend policy
emission policy
random forest algorithm
Ukraine
Opis:
This article explores links between the size of a company, industrial sector in which a company operates, concentration of capital, size of business and emission and dividend policy in the Ukrainian corporate sector. Guided by insights from the bird-in-hand theory, clientele theory, signaling theory, and agency theory, we justify factors that determine the choice of shares’ placement by Ukrainian public joint stock companies and forming of their dividend policy related to the current operating conditions of the Ukrainian corporate sector. Using mathematical approach of tree classification construction in the form of random forest algorithm, we found out that maximization of the share capital value, that is involved in shares issuance of Ukrainian PJSCs, is not a priority for owners of corporate rights. 86.1 per cent of companies have selected private placements of shares. In the non-financial sector, 87.5 per cent of companies opted private placements. The study revealed also only a small share (3.5%) of Ukrainian joint stock companies paid dividends to shareholders. However, the dividend policy of Ukrainian joint stock companies changed when they listed their shares on foreign stock markets. In this case two thirds of explored firms paid dividends.
Źródło:
Metody Ilościowe w Badaniach Ekonomicznych; 2016, 17, 4; 114-137
2082-792X
Pojawia się w:
Metody Ilościowe w Badaniach Ekonomicznych
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Predictive Business Process Monitoring with Tree-based Classification Algorithms
Autorzy:
Owczarek, Tomasz
Janke, Piotr
Powiązania:
https://bibliotekanauki.pl/articles/503954.pdf
Data publikacji:
2018
Wydawca:
Międzynarodowa Wyższa Szkoła Logistyki i Transportu
Tematy:
business process
prediction
classification
random forest
gradient boosting
Opis:
Predictive business process monitoring is a current research area which purpose is to predict the outcome of a whole process (or an element of a process i.e. a single event or task) based on available data. In the article we explore the possibility of use of the machine learning classification algorithms based on trees (CART, C5.0, random forest and extreme gradient boosting) in order to anticipate the result of a process. We test the application of these algorithms on real world event-log data and compare it with the known approaches. Our results show that.
Źródło:
Logistics and Transport; 2018, 40, 4; 73-82
1734-2015
Pojawia się w:
Logistics and Transport
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Interpretative machine learning as a key in recognizing the variability of lakes trophy patterns
Autorzy:
Jasiewicz, Jarosław
Zawiska, Izabela
Rzodkiewicz, Monika
Woszczyk, Michał
Powiązania:
https://bibliotekanauki.pl/articles/2054583.pdf
Data publikacji:
2022-03-31
Wydawca:
Uniwersytet im. Adama Mickiewicza w Poznaniu
Tematy:
total phosphorus
interpretative machine learning
random forest
Masurian lakes
Opis:
The paper presents an application of interpretative machine learning to identify groups of lakes not with similar features but with similar potential factors influencing the content of total phosphorus – Ptot. The method was developed on a sample of 60 lakes from North-Eastern Poland and used 25 external explanatory variables. Selected variables are stable over a long time, first group includes morphometric parameters of lakes and the second group en- compass watershed geometry geology and land use. Our method involves building a regression model, creating an ex- plainer, finding a set of mapping functions describing how each variable influences the outcome, and finally clustering objects by ’the influence’. The influence is a non-linear and non-parametric transformation of the explanatory variables into a form describing a given variable impact on the modeled feature. Such a transformation makes group data on the functional relations between the explanatory variables and the explained variable possible. The study reveals that there are five clusters where the concentration of Ptot is shaped similarly. We compared our method with other numerical analyses and showed that it provides new information on the catchment area and lake trophy relationship.
Źródło:
Quaestiones Geographicae; 2022, 41, 1; 127-146
0137-477X
2081-6383
Pojawia się w:
Quaestiones Geographicae
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Estimating parameters of empirical infiltration models from the global dataset using machine learning
Autorzy:
Kim, S.
Karahan, G.
Sharma, M.
Pachepsky, Y.
Powiązania:
https://bibliotekanauki.pl/articles/2083049.pdf
Data publikacji:
2020
Wydawca:
Polska Akademia Nauk. Instytut Agrofizyki PAN
Tematy:
infiltration modelling
random forest
Soil Water
Infiltration Global database
Opis:
It is beneficial to develop pedotransfer relationships to estimate infiltration equation coefficients in site-specific conditions from readily available data. No systematic studies have been published concerning the relationships between the accuracy of the infiltration equation and the accuracy of the predicted coefficients in this equation. The objective of this work was to test the hypothesis that, for the same infiltration data, the accuracy of pedotransfer predictions for coefficients in an infiltration equation is greater for the infiltration equation that performs better. The hypothesis was tested using the commonly employed Horton and Mezencev (modified Kostiakov) infiltration equations with data from the Soil Water Infiltration Global database. The random forest machine learning algorithm was used to develop the pedotransfer model. The Horton and the Mezencev models performed better with 928 and 758 datasets, respectively. The accuracy of the estimates of the infiltration equation coefficients did not differ substantially between the estimates obtained from all data and from the data where the infiltration equation had lower root-mean-squared error values. The root-mean-squared error values of the pedotransfer estimates decreased by 2 to 25% when only datasets with the same infiltration measurement method were considered. The development of predictive pedotransfer equations with the data obtained from the same infiltration measurement method is recommended.
Źródło:
International Agrophysics; 2021, 35, 1; 73-81
0236-8722
Pojawia się w:
International Agrophysics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Detection of DDoS Attacks in OpenStack-based Private Cloud Using Apache Spark
Autorzy:
Gumaste, Shweta
G., Narayan D.
Shinde, Sumedha
K., Amit
Powiązania:
https://bibliotekanauki.pl/articles/1839316.pdf
Data publikacji:
2020
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
cloud
DDoS
distributed processing
OpenStack
Apache Spark
random forest
Opis:
Security is a critical concern for cloud service providers. Distributed denial of service (DDoS) attacks are the most frequent of all cloud security threats, and the consequences of damage caused by DDoS are very serious. Thus, the design of an efficient DDoS detection system plays an important role in monitoring suspicious activity in the cloud. Real-time detection mechanisms operating in cloud environments and relying on machine learning algorithms and distributed processing are an important research issue. In this work, we propose a real-time detection of DDoS attacks using machine learning classifiers on a distributed processing platform. We evaluate the DDoS detection mechanism in an OpenStack-based cloud testbed using the Apache Spark framework. We compare the classification performance using benchmark and real-time cloud datasets. Results of the experiments reveal that the random forest method offers better classifier accuracy. Furthermore, we demonstrate the effectiveness of the proposed distributed approach in terms of training and detection time.
Źródło:
Journal of Telecommunications and Information Technology; 2020, 4; 62-71
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Development of Flood-Hazard-Mapping Model Using Random Forest and Frequency Ratio in Sumedang Regency, West Java, Indonesia
Autorzy:
Ismanto, Rido Dwi
Fitriana, Hana Listi
Manalu, Johanes
Purboyo, Alvian Aji
Prasasti, Indah
Powiązania:
https://bibliotekanauki.pl/articles/27314279.pdf
Data publikacji:
2023
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
flood-susceptibility assessment
random forest
frequency ratio
Sumedang
remote sensing
Opis:
Flooding, often triggered by heavy rainfall, is a common natural disaster in Indonesia, and is the third most common type of disaster in Sumedang Regency. Hence, flood-susceptibility mapping is essential for flood management. The primary challenge in this lies in the complex, non-linear relationships between indices and risk levels. To address this, the application of random forest (RF) and frequency ratio (FR) methods has been explored. Ten flood-conditioning factors were determined from the references: the distance from a river, elevation, geology, geomorphology, lithology, land use/land cover, rainfall, slope, soil type, and topographic wetness index (TWI). The 35 flood locations from the flood-inventory map were selected, and the remaining 18 flood locations were used for justifying the outcomes. The flooded areas from the RF model were 28.39%; the rest (71.61%) were non-flooded areas. Also, the flooded areas from the FR method were 8.02%, and the non-flooded areas were 91.98%. The AUC for both methods was a similar value – 83.0%. This result is quite accurate and can be used by policymakers to prevent and manage future flooding in the Sumedang area. These results can also be used as materials for updating existing flood-susceptibility maps.
Źródło:
Geomatics and Environmental Engineering; 2023, 17, 6; 129--157
1898-1135
Pojawia się w:
Geomatics and Environmental Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Prognozowanie przedziału czasowego z maksymalnym w ciągu doby z użyciem gazu przez kotłownię
Forecasting the time interval of the day with the maximum boilers gas consumption
Autorzy:
Nowak, Bogdan
Bartnicki, Grzegorz
Powiązania:
https://bibliotekanauki.pl/articles/394678.pdf
Data publikacji:
2019
Wydawca:
Polska Akademia Nauk. Instytut Gospodarki Surowcami Mineralnymi i Energią PAN
Tematy:
zużycie gazu
model prognostyczny
random forest
gas consumption
prognostic model
Opis:
Działania mające na celu poprawę efektywności energetycznej systemów zaopatrzenia w ciepło wymagają korzystania z coraz bardziej złożonych metod. Podstawowe sposoby zmniejszenia zużycia ciepła poprzez stosowanie lepszej izolacji cieplnej mają coraz bardziej ograniczone możliwości iwymagają stosunkowo dużych nakładów finansowych. Dobre efekty mogą być osiągane przez coraz lepsze dopasowanie rozwiązań technicznych, sposobów regulacji czy zasad eksploatacji źródła ciepła do warunków konkretnego obiektu zasilanego wciepło. Wymaga to jednak zarówno badań identyfikujących skuteczność takich metod, jak inarzędzi służących do opisu wybranych elementów systemu czy jego całości. Artykuł przedstawia wyniki badań przeprowadzonych dla kotłowni gazowej zasilającej w ciepło grupę budynków mieszkalnych. Celem było zbudowanie modelu, który prognozowałby dla konkretnego dnia przedział czasowy, w którym występuje maksymalne zużycie gazu. Dysponując pomiarami zużycia gazu wkolejnych godzinach doby, zdecydowano się zbudować model prognostyczny wyznaczający tę część doby, w której takie maksimum wystąpi. W opracowanym modelu zdecydowano się zastosować procedurę lasów losowych (random forest). Do utworzenia modelu zastosowano pakiet mlr (Kassambara), w którym przeprowadzono również strojenie hiperparametrów modelu na bazie danych historycznych. W oparciu o odrębne dane dla innego okresu działania kotłowni przedstawiono wyniki oceny jego jakości. Uzyskano skuteczność niemal 44%. Strojenie modelu wpłynęło na poprawę jego zdolności predykcyjnych.
The heat supply systems energy efficiency improvement requires the use of increasingly complex methods. The basic ways to reduce heat consumption is by using better thermal insulation, although they have more and more limited possibilities and need relatively large financial outlays. Good effects can be achieved by the better heat source adaptation to the conditions of aspecific facility supplied with heat. However, this requires research that identifies the effectiveness of such solutions as well as the tools used to describe selected elements of the system or its entirety. The article presents the results of tests carried out for agas boiler room supplying heat to agroup of residential buildings. The goal was to build amodel that would forecast the day range in which the maximum gas consumption occurs for agiven day. Having measurements of gas consumption in subsequent hours of the day, it was decided to build aforecasting model determining the part of the day in which such amaximum would occur. To create the model the random forest procedure was used along with the mlr (Kassambara) package. The model’s hyperparameters were tuned based on historical data. Based on data for another period of boilerroom operation, the results of the model’s quality assessment were presented. Close to 44% efficiency was achieved. Tuning the model improved its predictive ability.
Źródło:
Zeszyty Naukowe Instytutu Gospodarki Surowcami Mineralnymi i Energią PAN; 2019, 109; 93-109
2080-0819
Pojawia się w:
Zeszyty Naukowe Instytutu Gospodarki Surowcami Mineralnymi i Energią PAN
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Vibroacoustic Real Time Fuel Classification in Diesel Engine
Autorzy:
Bąkowski, A.
Kekez, M.
Radziszewski, L.
Sapietova, A.
Powiązania:
https://bibliotekanauki.pl/articles/177686.pdf
Data publikacji:
2018
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
fuel recognition
classification trees
particle swarm optimization (PSO)
random forest
Opis:
Five models and methodology are discussed in this paper for constructing classifiers capable of recognizing in real time the type of fuel injected into a diesel engine cylinder to accuracy acceptable in practical technical applications. Experimental research was carried out on the dynamic engine test facility. The signal of in-cylinder and in-injection line pressure in an internal combustion engine powered by mineral fuel, biodiesel or blends of these two fuel types was evaluated using the vibro-acoustic method. Computational intelligence methods such as classification trees, particle swarm optimization and random forest were applied.
Źródło:
Archives of Acoustics; 2018, 43, 3; 385-395
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A System for Filling Store Displays: Pitting a Single Model against a Set of Demand Forecasting Models
System zapełnienia ekspozycji sklepowych: pojedynczy model a zespół modeli prognozowania popytu
Autorzy:
Myna, Artur
Myna, Jacek
Powiązania:
https://bibliotekanauki.pl/articles/2206342.pdf
Data publikacji:
2023
Wydawca:
Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:
Extreme Gradient Boosting
logistic regression
random forest
regresja logistyczna
las losowy
Opis:
The aim of the paper was to develop the concept of retail display space allocation as a system and to assess the quality of very slow-moving products demand forecasting models (that have not yet been used by retail companies in Poland) as its key subsystem. Forecasts were made using the example of a clothing company. The quality of these models was assessed using the Weighted Mean Absolute Percentage Error. The first step was to build the individual models. Later, the authors built separate models for brick-and-mortar and online stores as well as brands, creating a set of six models. The findings show that the classification approach for very slow movers provides as precise results as the regression approach. No single model or set of models (built with a particular machine learning method) could be identified that made the best demand forecasts for brick-and-mortar stores, as statistical tests generally did not confirm the significance of the differences between the median forecasts.
Celem artykułu jest opracowanie koncepcji zapełnienia ekspozycji sklepowych jako sys- temu oraz ocena jakości modeli prognozowania popytu (które w Polsce nie są jeszcze wykorzystywane przez sieci handlowe) bardzo wolno rotujących produktów jako jego kluczowego podsystemu. Jakość modeli oceniono za pomocą miary Weighted Mean Absolute Percentage Error na różnych poziomach szczegółowości: dla całej sieci sprzedaży i określonego miesiąca oraz na „na przecięciu” sklepu, produk- tu i rozmiaru produktu. Najpierw zbudowano pojedyncze modele, następnie zaś odrębne modele dla sklepów stacjonarnych i internetowych, jak również marek, tworząc zespół sześciu modeli. Poprawę dopasowania modeli osiągnięto tylko dla sklepów internetowych. Wyniki pracy wskazują, że podejście klasyfikacyjne dla bardzo wolno rotujących produktów charakteryzują równie precyzyjne wyniki pro- gnoz jak podejście regresyjne. Nie można wskazać jednego modelu lub zespołu modeli (zbudowanego określoną metodą uczenia maszynowego), który wykonał najlepsze prognozy popytu dla sklepów sta- cjonarnych, gdyż istotności różnic median prognoz na ogół nie potwierdzono testami statystycznymi.
Źródło:
Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu; 2023, 67, 2; 96-106
1899-3192
Pojawia się w:
Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Artificial Intelligence Based Flood Forecasting for River Hunza at Danyor Station in Pakistan
Autorzy:
Yaseen, Muhammad Waseem
Awais, Muhammad
Riaz, Khuram
Rasheed, Muhammad Babar
Waqar, Muhammad
Rasheed, Sajid
Powiązania:
https://bibliotekanauki.pl/articles/31340346.pdf
Data publikacji:
2022
Wydawca:
Polska Akademia Nauk. Instytut Budownictwa Wodnego PAN
Tematy:
hydrometeorology
random forest
support vector
multilayer perceptron
machine learning
flood forecasting
Opis:
Floods can cause significant problems for humans and can damage the economy. Implementing a reliable flood monitoring warning system in risk areas can help to reduce the negative impacts of these natural disasters. Artificial intelligence algorithms and statistical approaches are employed by researchers to enhance flood forecasting. In this study, a dataset was created using unique features measured by sensors along the Hunza River in Pakistan over the past 31 years. The dataset was used for classification and regression problems. Two types of machine learning algorithms were tested for classification: classical algorithms (Random Forest, RF and Support Vector Classifier, SVC) and deep learning algorithms (Multi-Layer Perceptron, MLP). For the regression problem, the result of MLP and Support Vector Regression (SVR) algorithms were compared based on their mean square, root mean square and mean absolute errors. The results obtained show that the accuracy of the RF classifier is 0.99, while the accuracies of the SVC and MLP methods are 0.98; moreover, in the case of flood prediction, the SVR algorithm outperforms the MLP approach.
Źródło:
Archives of Hydro-Engineering and Environmental Mechanics; 2022, 69, 1; 59-77
1231-3726
Pojawia się w:
Archives of Hydro-Engineering and Environmental Mechanics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Development of Data-mining Technique for Seismic Vulnerability Assessment
Autorzy:
Wojcik, Waldemar
Karmenova, Markhaba
Smailova, Saule
Tlebaldinova, Aizhan
Belbeubaev, Alisher
Powiązania:
https://bibliotekanauki.pl/articles/1844631.pdf
Data publikacji:
2021
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
data analysis
seismic assessment
clustering
h-means
k-means
random forest
Opis:
Assessment of seismic vulnerability of urban infrastructure is an actual problem, since the damage caused by earthquakes is quite significant. Despite the complexity of such tasks, today’s machine learning methods allow the use of “fast” methods for assessing seismic vulnerability. The article proposes a methodology for assessing the characteristics of typical urban objects that affect their seismic resistance; using classification and clustering methods. For the analysis, we use kmeans and hkmeans clustering methods, where the Euclidean distance is used as a measure of proximity. The optimal number of clusters is determined using the Elbow method. A decision-making model on the seismic resistance of an urban object is presented, also the most important variables that have the greatest impact on the seismic resistance of an urban object are identified. The study shows that the results of clustering coincide with expert estimates, and the characteristic of typical urban objects can be determined as a result of data modeling using clustering algorithms.
Źródło:
International Journal of Electronics and Telecommunications; 2021, 67, 2; 261-266
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Integrating Vegetation Indices and Spectral Features for Vegetation Mapping from Multispectral Satellite Imagery Using AdaBoost and Random Forest Machine Learning Classifiers
Autorzy:
Saini, Rashmi
Powiązania:
https://bibliotekanauki.pl/articles/2174656.pdf
Data publikacji:
2023
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
ensemble classifiers
Machine Learning
Random Forest
AdaBoost
vegetation mapping
vegetation indices
Opis:
Vegetation mapping is an active research area in the domain of remote sensing. This study proposes a methodology for the mapping of vegetation by integrating several vegetation indices along with original spectral bands. The Land Use Land Cover classification was performed by two powerful Machine Learning techniques, namely Random Forest and AdaBoost. The Random Forest algorithm works on the concept of building multiple decision trees for the final prediction. The other Machine Learning technique selected for the classification is AdaBoost (adaptive boosting), converts a set of weak learners into strong learners. Here, multispectral satellite data of Dehradun, India, was utilised. The results demonstrate an increase of 3.87% and 4.32% after inclusion of selected vegetation indices by Random Forest and AdaBoost respectively. An Overall Accuracy (OA) of 91.23% (kappa value of 0.89) and 88.59% (kappa value of 0.86) was obtained by means of the Random Forest and AdaBoost classifiers respectively. Although Random Forest achieved greater OA as compared to AdaBoost, interestingly AdaBoost provided better class-specific accuracy for the Shrubland class compared to Random Forest. Furthermore, this study also evaluated the importance of each individual feature used in the classification. Results demonstrated that the NDRE, GNDVI, and RTVIcore vegetation indices, and spectral bands (NIR, and Red-Edge), obtained higher importance scores.
Źródło:
Geomatics and Environmental Engineering; 2023, 17, 1; 57--74
1898-1135
Pojawia się w:
Geomatics and Environmental Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A random forest model for the prediction of spudcan penetration resistance in stiff-over-soft clays
Autorzy:
Gao, Pan
Liu, Zhihui
Zeng, Ji
Zhan, Yiting
Wang, Fei
Powiązania:
https://bibliotekanauki.pl/articles/1573798.pdf
Data publikacji:
2020
Wydawca:
Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:
machine learning
random forest
jack-up
penetration resistance
stiff-over-soft clays
Opis:
Punch-through is a major threat to the jack-up unit, especially at well sites with layered stiff-over-soft clays. A model is proposed to predict the spudcan penetration resistance in stiff-over-soft clays, based on the random forest (RF) method. The RF model was trained and tested with numerical simulation results obtained through the Finite Element model, implemented with the Coupled Eulerian Lagrangian (CEL) approach. With the proposed CEL model, the effects of the stiff layer thickness, undrained shear strength ratio, and the undrained shear strength of the soft layer on the bearing characteristics, as well as the soil failure mechanism, were numerically studied. A simplified resistance profile model of penetration in stiff-over-soft clays is proposed, divided into three sections by the peak point and the transition point. The importance of soil parameters to the penetration resistance was analysed. Then, the trained RF model was tested against the test set, showing a good prediction of the numerical cases. Finally, the trained RF was validated against centrifuge tests. The RF model successfully captured the punch-through potential, and was verified using data recorded in the field, showing advantages over the SNAME guideline. It is supposed that the trained RF model should give a good prediction of the spudcan penetration resistance profile, especially if trained with more field data.
Źródło:
Polish Maritime Research; 2020, 4; 130-138
1233-2585
Pojawia się w:
Polish Maritime Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Monitoring Vegetation Cover Changes by Sentinel-1 Radar Images Using Random Forest Classification Method
Autorzy:
Tran, Van Anh
Le, Thi Le
Nguyen, Nhu Hung
Le, Thanh Nghi
Tran, Hong Hanh
Powiązania:
https://bibliotekanauki.pl/articles/2020227.pdf
Data publikacji:
2021
Wydawca:
Polskie Towarzystwo Przeróbki Kopalin
Tematy:
vegetation cover change,
Sentinel-1
Random Forest
Binh Duong
Vietnam
Wietnam
wegetacja
Opis:
Vietnam is an Asian country with hot and humid tropical climate throughout the year. Forests account for more than 40% of the total land area and have a very rich and diverse vegetation. Monitoring the changes in the vegetation cover is obviously important yet challenging, considering such large varying areas and climatic conditions. A traditional remote sensing technique to monitor the vegetation cover involves the use of optical satellite images. However, in presence of the cloud cover, the analyses done using optical satellite image are not reliable. In such a scenario, radar images are a useful alternative due to the ability of radar pulses in penetrating through the clouds, regardless of day or night. In this study, we have used multi temporal C band satellite images to monitor vegetation cover changes for an area in Dau Tieng and Ben Cat districts of Binh Duong province, Mekong Delta, Vietnam. With a collection of 46 images between March 2015 and February 2017, the changes of five land cover types including vegetation loss and replanting in 2017 were analyzed by selecting two cases, using 9 images in the dry season of 3 years 2015, 2016 and 2017 and using all of 46 images to conduct Random Forest classifier with 100, 200, 300 and 500 trees respectively. The result in which the model with nine images and 300 trees gave the best accuracy with an overall accuracy of 98.4% and a Kappa of 0.97. The results demonstrated that using VH polarization, Sentinel-1 gives quite a good accuracy for vegetation cover change. Therefore, Sentinel-1 can also be used to generate reliable land cover maps suitable for different applications.
Źródło:
Inżynieria Mineralna; 2021, 2; 441--451
1640-4920
Pojawia się w:
Inżynieria Mineralna
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Application of machine learning algorithms to predict permeability in tight sandstone formations
Zastosowanie metod uczenia maszynowego do przewidywania przepuszczalności w formacjach zwięzłych piaskowców typu tight gas
Autorzy:
Topór, Tomasz
Powiązania:
https://bibliotekanauki.pl/articles/2143653.pdf
Data publikacji:
2021
Wydawca:
Instytut Nafty i Gazu - Państwowy Instytut Badawczy
Tematy:
machine learning
random forest
permeability
prediction
uczenie maszynowe
lasy losowe
predykcja
przepuszczalność
Opis:
The application of machine learning algorithms in petroleum geology has opened a new chapter in oil and gas exploration. Machine learning algorithms have been successfully used to predict crucial petrophysical properties when characterizing reservoirs. This study utilizes the concept of machine learning to predict permeability under confining stress conditions for samples from tight sandstone formations. The models were constructed using two machine learning algorithms of varying complexity (multiple linear regression [MLR] and random forests [RF]) and trained on a dataset that combined basic well information, basic petrophysical data, and rock type from a visual inspection of the core material. The RF algorithm underwent feature engineering to increase the number of predictors in the models. In order to check the training models’ robustness, 10-fold cross-validation was performed. The MLR and RF applications demonstrated that both algorithms can accurately predict permeability under constant confining pressure (R2 0.800 vs. 0.834). The RF accuracy was about 3% better than that of the MLR and about 6% better than the linear reference regression (LR) that utilized only porosity. Porosity was the most influential feature of the models’ performance. In the case of RF, the depth was also significant in the permeability predictions, which could be evidence of hidden interactions between the variables of porosity and depth. The local interpretation revealed the common features among outliers. Both the training and testing sets had moderate-low porosity (3–10%) and a lack of fractures. In the test set, calcite or quartz cementation also led to poor permeability predictions. The workflow that utilizes the tidymodels concept will be further applied in more complex examples to predict spatial petrophysical features from seismic attributes using various machine learning algorithms.
Zastosowanie algorytmów uczenia maszynowego w geologii naftowej otworzyło nowy rozdział w poszukiwaniu złóż ropy i gazu. Algorytmy uczenia maszynowego zostały z powodzeniem wykorzystane do przewidywania kluczowych właściwości petrofizycznych charakteryzujących złoże. W pracy zastosowano metody uczenia maszynowego do przewidywania przepuszczalności w warunkach ustalonego ciśnienia złożowego dla formacji zwięzłych piaskowców typu tight gas. Modele zostały skonstruowane przy użyciu algorytmów o różnym stopniu komplikacji (wielowymiarowa regresja liniowa – MLR i lasy losowe – RF), a następnie poddano je procesowi uczenia na danych zawierających podstawowe informacje o otworze, podstawowe parametry petrofizyczne oraz typ skał pochodzący z makroskopowego i mikroskopowego opisu próbek rdzeni. Typ skał został rozkodowany i poddany procesowi inżynierii cech, aby wydobyć dodatkowe zmienne do modelu. Proces uczenia na zbiorze treningowym został przeprowadzony z wykorzystaniem 10-krotnej kroswalidacji. Uzyskane wyniki pokazują, że oba algorytmy mogą przewidywać przepuszczalność z dużą dokładnością (R2 = 0,800 dla MLR vs R2 = 0,834 dla RF). Dokładność modelu RF jest około 3% lepsza niż MLR i około 6% lepsza w porównaniu do modelu referencyjnego (model regresji liniowej z jedną zmienną – porowatością). W przypadku obu modeli porowatość była najistotniejszym parametrem przy przewidywaniu przepuszczalności. Dodatkowo w modelu wykorzystującym lasy losowe istotną cechą okazała się głębokość próbki, co może świadczyć o dodatkowych interakcjach pomiędzy zmiennymi. Cechą wspólną próbek w zbiorze treningowym i testowym, dla których modele zadziałały ze słabą skutecznością, były porowatość od 3% do 10% i brak spękań. Dodatkowo w zbiorze testowym niska dokładność przewidywań przepuszczalności była związana z obecnością cementacji kalcytem i kwarcem. Workflow wykorzystujący stan wiedzy dotyczącej modelowania, którego trzon stanowi pakiet tidymodels, będzie dalej stosowany do prognozowania przestrzennych właściwości petrofizycznych na podstawie atrybutów sejsmicznych.
Źródło:
Nafta-Gaz; 2021, 77, 5; 283-292
0867-8871
Pojawia się w:
Nafta-Gaz
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
An Approach to License Plate Recognition in Real Time Using Multi-stage Computational Intelligence Classifier
Autorzy:
Kekez, Michał
Powiązania:
https://bibliotekanauki.pl/articles/27311914.pdf
Data publikacji:
2023
Wydawca:
Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:
car license plates
LPR
ANPR
OCR
image processing
neural network
Random Forest
Opis:
Automatic car license plate recognition (LPR) is widely used nowadays. It involves plate localization in the image, character segmentation and optical character recognition. In this paper, a set of descriptors of image segments (characters) was proposed as well as a technique of multi-stage classification of letters and digits using cascade of neural network and several parallel Random Forest or classification tree or rule list classifiers. The proposed solution was applied to automated recognition of number plates which are composed of capital Latin letters and Arabic numerals. The paper presents an analysis of the accuracy of the obtained classifiers. The time needed to build the classifier and the time needed to classify characters using it are also presented.
Źródło:
International Journal of Electronics and Telecommunications; 2023, 69, 2; 275--280
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Developing a data-driven soft sensor to predict silicate impurity in iron ore flotation concentrate
Autorzy:
Pural, Yusuf Enes
Powiązania:
https://bibliotekanauki.pl/articles/24148677.pdf
Data publikacji:
2023
Wydawca:
Politechnika Wrocławska. Oficyna Wydawnicza Politechniki Wrocławskiej
Tematy:
soft sensor
machine learning
random forest
multi-layer perceptron
flotation
grade estimation
Opis:
Soft sensors are mathematical models that estimate the value of a process variable that is difficult or expensive to measure directly. They can be based on first principle models, data-based models, or a combination of both. These models are increasingly used in mineral processing to estimate and optimize important performance parameters such as mill load, mineral grades, and particle size. This study investigates the development of a data-driven soft sensor to predict the silicate content in iron ore reverse flotation concentrate, a crucial indicator of plant performance. The proposed soft sensor model employs a dataset obtained from Kaggle, which includes measurements of iron and silicate content in the feed to the plant, reagent dosages, weight and pH of pulp, as well as the amount of air and froth levels in the flotation units. To reduce the dimensionality of the dataset, Principal Component Analysis, an unsupervised machine learning method, was applied. The soft sensor model was developed using three machine learning algorithms, namely, Ridge Regression, Multi-Layer Perceptron, and Random Forest. The Random Forest model, created with non-reduced data, demonstrated superior performance, with an R-squared value of 96.5% and a mean absolute error of 0.089. The results suggest that the proposed soft sensor model can accurately predict the silicate content in the iron ore flotation concentrate using machine learning algorithms. Moreover, the study highlights the importance of selecting appropriate algorithms for soft sensor developments in mineral processing plants.
Źródło:
Physicochemical Problems of Mineral Processing; 2023, 59, 5; art. no. 169823
1643-1049
2084-4735
Pojawia się w:
Physicochemical Problems of Mineral Processing
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The use of data mining models in solving the problem of imbalanced classes based on the example of an online marketing campaign
Wykorzystanie modeli data mining w rozwiązywaniu problemu niezrównoważonych klas na przykładzie kampanii marketingowych w Internecie
Autorzy:
Łapczyński, Mariusz
Surma, Jerzy
Powiązania:
https://bibliotekanauki.pl/articles/424980.pdf
Data publikacji:
2015
Wydawca:
Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:
C&RT
Random Forest
imbalanced class problem
online social network
banner ad campaign
Opis:
While building predictive models in analytical CRM, researchers often encounter the problem of imbalanced classes (skewed distributions of dependent variables), which consists in the fact that the number of observations belonging to one category of the dependent variable is much lower than the number of observations belonging to the second category of that variable. This is related to such areas as churn analysis, customer acquisition models and cross and up-selling models. The purpose of the paper is to present a predictive model that was built to predict the response of Internet users to banner advertising. The dataset used in the study came from an online social network which offers advertisers banner campaigns targeting its users. The advertising campaign of a cosmetics company was carried out in the autumn of 2010 and was mainly targeted at young women. A user of this service was described by 115 independent variables – 3 out of which were demographic variables (sex, age, education), and the remaining 112 referred to the user’s online activity. While building the model there appeared the problem of imbalanced classes due to the low number of users who clicked on the banner ad. The number of cases amounted to 81,000, while the number of positive reactions to the banner was 207, which constitutes approximately 0.25% of the dependent variable. During the study, two popular data mining tools were utilized – the decision trees C&RT and Random Forest. The second goal of this paper is to compare the performance of the predictive models based on both these analytical tools.
Źródło:
Econometrics. Ekonometria. Advances in Applied Data Analytics; 2015, 3 (49); 9-19
1507-3866
Pojawia się w:
Econometrics. Ekonometria. Advances in Applied Data Analytics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A Study on the Optimization of Metalloid Contents of Fe-Si-B-C Based Amorphous Soft Magnetic Materials Using Artificial Intelligence Method
Autorzy:
Choi, Young-Sin
Kwon, Do-Hun
Lee, Min_Woo
Cha, Eun-Ji
Jeon, Junhyub
Lee, Seok-Jae
Kim, Jongryoul
Kim, Hwi-Jun
Powiązania:
https://bibliotekanauki.pl/articles/2174571.pdf
Data publikacji:
2022
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
Fe-based amorphous
soft magnetic properties
artificial intelligence
machine learning
random forest regression
Opis:
The soft magnetic properties of Fe-based amorphous alloys can be controlled by their compositions through alloy design. Experimental data on these alloys show some discrepancy, however, with predicted values. For further improvement of the soft magnetic properties, machine learning processes such as random forest regression, k-nearest neighbors regression and support vector regression can be helpful to optimize the composition. In this study, the random forest regression method was used to find the optimum compositions of Fe-Si-B-C alloys. As a result, the lowest coercivity was observed in Fe80.5Si3.63B13.54C2.33 at.% and the highest saturation magnetization was obtained Fe81.83Si3.63B12.63C1.91at.% with R2 values of 0.74 and 0.878, respectively.
Źródło:
Archives of Metallurgy and Materials; 2022, 67, 4; 1459--1463
1733-3490
Pojawia się w:
Archives of Metallurgy and Materials
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Mining Data of Noisy Signal Patterns in Recognition of Gasoline Bio-Based Additives using Electronic Nose
Autorzy:
Osowski, S.
Siwek, K.
Powiązania:
https://bibliotekanauki.pl/articles/220792.pdf
Data publikacji:
2017
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
data mining
electronic nose
gasoline blends
random forest
support vector machine
wavelet denoising
Opis:
The paper analyses the distorted data of an electronic nose in recognizing the gasoline bio-based additives. Different tools of data mining, such as the methods of data clustering, principal component analysis, wavelet transformation, support vector machine and random forest of decision trees are applied. A special stress is put on the robustness of signal processing systems to the noise distorting the registered sensor signals. A special denoising procedure based on application of discrete wavelet transformation has been proposed. This procedure enables to reduce the error rate of recognition in a significant way. The numerical results of experiments devoted to the recognition of different blends of gasoline have shown the superiority of support vector machine in a noisy environment of measurement.
Źródło:
Metrology and Measurement Systems; 2017, 24, 1; 27-44
0860-8229
Pojawia się w:
Metrology and Measurement Systems
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Ensemble-based Method of Fraud Detection at Self-checkouts in Retail
Autorzy:
Vitynskyi, P.
Tkachenko, R.
Izonin, I.
Powiązania:
https://bibliotekanauki.pl/articles/410756.pdf
Data publikacji:
2019
Wydawca:
Polska Akademia Nauk. Oddział w Lublinie PAN
Tematy:
classification
Ensemble-based method
Random Forest
fraud detection
retail
Ito decomposition
imbalanced dataset
Opis:
The authors consider the problem of fraud detection at self-checkouts in retail in condition of unbalanced data set. A new ensemble-based method is proposed for its effective solution. The developed method involves two main steps: application of the preprocessing procedures and the Random Forest algorithm. The step-by-step implementation of the preprocessing stage involves the sequential execution of such procedures over the input data: scaling by maximal element in a column with row-wise scaling by Euclidean norm, weighting by correlation and applying polynomial extension. For polynomial extension Ito decomposition of the second degree is used. The simulation of the method was carried out on real data. Evaluating performance was based on the use of cost matrix. The experimental comparison of the effectiveness of the developed ensemble-based method with a number of existing (simples and ensembles) demonstrates the best performance of the developed method. Experimental studies of changing the parameters of the Random Forest both for the basic algorithm and for the developed method demonstrate a significant improvement of the investigated efficiency measures of the latter. It is the result of all steps of the preprocessing stage of the developed method use.
Źródło:
ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes; 2019, 8, 2; 3-8
2084-5715
Pojawia się w:
ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Maximising accuracy and efficiency of traffic accident prediction combining information mining with computational intelligence approaches and decision trees
Autorzy:
Tambouratzis, T>
Souliou, D.
Chalikias, M.
Gregoriades, A.
Powiązania:
https://bibliotekanauki.pl/articles/91652.pdf
Data publikacji:
2014
Wydawca:
Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:
traffic accident
location
prediction
probabilistic neural networks
random forest
accuracy
efficiency
decision tree
Opis:
The development of universal methodologies for the accurate, efficient, and timely prediction of traffic accident location and severity constitutes a crucial endeavour. In this piece of research, the best combinations of salient accident-related parameters and accurate accident severity prediction models are determined for the 2005 accident dataset brought together by the Republic of Cyprus Police. The optimal methodology involves: (a) information mining in the form of feature selection of the accident parameters that maximise prediction accuracy (implemented via scatter search), followed by feature extraction (implemented via principal component analysis) and selection of the minimal number of components that contain the salient information of the original parameters, which combined bring about an overall 74.42% reduction in the dataset dimensionality; (b) accident severity prediction via probabilistic neural networks and random forests, both of which independently accomplish over 96% correct prediction and a balanced proportion of under- and over-estimations of accident severity. An explanation of the superiority of the optimal combinations of parameters and models is given, as is a comparison with existing accident classification/prediction approaches.
Źródło:
Journal of Artificial Intelligence and Soft Computing Research; 2014, 4, 1; 31-42
2083-2567
2449-6499
Pojawia się w:
Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Assessment of Approaches for the Extraction of Building Footprints from Pléiades Images
Autorzy:
Taha, Lamyaa Gamal El-deen
Ibrahim, Rania Elsayed
Powiązania:
https://bibliotekanauki.pl/articles/1837996.pdf
Data publikacji:
2021
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
ensemble classifiers
machine learning
random forest
maximum likelihood
support vector machines
backpropagation
image classification
Opis:
The Marina area represents an official new gateway of entry to Egypt and the development of infrastructure is proceeding rapidly in this region. The objective of this research is to obtain building data by means of automated extraction from Pléiades satellite images. This is due to the need for efficient mapping and updating of geodatabases for urban planning and touristic development. It compares the performance of random forest algorithm to other classifiers like maximum likelihood, support vector machines, and backpropagation neural networks over the well-organized buildings which appeared in the satellite images. Images were subsequently classified into two classes: buildings and non-buildings. In addition, basic morphological operations such as opening and closing were used to enhance the smoothness and connectedness of the classified imagery. The overall accuracy for random forest, maximum likelihood, support vector machines, and backpropagation were 97%, 95%, 93% and 92% respectively. It was found that random forest was the best option, followed by maximum likelihood, while the least effective was the backpropagation neural network. The completeness and correctness of the detected buildings were evaluated. Experiments confirmed that the four classification methods can effectively and accurately detect 100% of buildings from very high-resolution images. It is encouraged to use machine learning algorithms for object detection and extraction from very high-resolution images.
Źródło:
Geomatics and Environmental Engineering; 2021, 15, 4; 101-116
1898-1135
Pojawia się w:
Geomatics and Environmental Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Semantic Segmentation of Diseases in Mushrooms using Enhanced Random Forest
Autorzy:
Yacharam, Rakesh Kumar
Sekhar, Dr. V. Chandra
Powiązania:
https://bibliotekanauki.pl/articles/31339414.pdf
Data publikacji:
2023
Wydawca:
Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Instytut Informatyki Technicznej
Tematy:
mushroom diseases
semantic segmentation
computer aided
Machine Learning
significant feature extraction
Random Forest classifier
Opis:
Mushrooms are a rich source of antioxidants and nutritional values. Edible mushrooms, however, are susceptible to various diseases such as dry bubble, wet bubble, cobweb, bacterial blotches, and mites. Farmers face significant production losses due to these diseases affecting mushrooms. The manual detection of these diseases relies on expertise, knowledge of diseases, and human effort. Therefore, there is a need for computer-aided methods, which serve as optimal substitutes for detecting and segmenting diseases. In this paper, we propose a semantic segmentation approach based on the Random Forest machine learning technique for the detection and segmentation of mushroom diseases. Our focus lies in extracting a combination of different features, including Gabor, Bouda, Kayyali, Gaussian, Canny edge, Roberts, Sobel, Scharr, Prewitt, Median, and Variance. We employ constant mean-variance thresholding and the Pearson correlation coefficient to extract significant features, aiming to enhance computational speed and reduce complexity in training the Random Forest classifier. Our results indicate that semantic segmentation based on Random Forest outperforms other methods such as Support Vector Machine (SVM), Naïve Bayes, K-means, and Region of Interest in terms of accuracy. Additionally, it exhibits superior precision, recall, and F1 score compared to SVM. It is worth noting that deep learning-based semantic segmentation methods were not considered due to the limited availability of diseased mushroom images.
Źródło:
Machine Graphics & Vision; 2023, 32, 2; 129-146
1230-0535
2720-250X
Pojawia się w:
Machine Graphics & Vision
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A comparative study on performance of basic and ensemble classifiers with various datasets
Autorzy:
Gunakala, Archana
Shahid, Afzal Hussain
Powiązania:
https://bibliotekanauki.pl/articles/30148255.pdf
Data publikacji:
2023
Wydawca:
Polskie Towarzystwo Promocji Wiedzy
Tematy:
classification
Naïve Bayes
neural network
Support Vector Machine
Decision Tree
ensemble learning
Random Forest
Opis:
Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen based on the model's performance and execution time. This paper compares and analyzes the performance of basic as well as ensemble classifiers utilizing 10-fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from Kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01%. The proposed ensemble combinations outperformed the conven¬tional models for few datasets.
Źródło:
Applied Computer Science; 2023, 19, 1; 107-132
1895-3735
2353-6977
Pojawia się w:
Applied Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Impact of the COVID-19 pandemic on the expression of emotions in social media
Autorzy:
Ghosh, Debabrata
Powiązania:
https://bibliotekanauki.pl/articles/2027766.pdf
Data publikacji:
2020
Wydawca:
Uniwersytet Ekonomiczny w Katowicach
Tematy:
Classification
COVID-19
Emotion
Emotion analysis
Naïve Bayes
Pandemic
Random Forest
Support Vector Machine
Opis:
In the age of social media, every second thousands of messages are exchanged. Analyzing those unstructured data to find out specific emotions is a challenging task. Analysis of emotions involves evaluation and classification of text into emotion classes such as Happy, Sad, Anger, Disgust, Fear, Surprise, as defined by emotion dimensional models which are described in the theory of psychology (www 1; Russell, 2005). The main goal of this paper is to cover the COVID-19 pandemic situation in India and its impact on human emotions. As people very often express their state of the mind through social media, analyzing and tracking their emotions can be very effective for government and local authorities to take required measures. We have analyzed different machine learning classification models, such as Naïve Bayes, Support Vector Machine, Random Forest Classifier, Decision Tree and Logistic Regression with 10-fold cross validation to find out top ML models for emotion classification. After tuning the Hyperparameter, we got Logistic regression as the best suited model with accuracy 77% with the given datasets. We worked on algorithm based supervised ML technique to get the expected result. Although multiple studies were conducted earlier along the same lines, none of them performed comparative study among different ML techniques or hyperparameter tuning to optimize the results. Besides, this study has been done on the dataset of the most recent COVID-19 pandemic situation, which is itself unique. We captured Twitter data for a duration of 45 days with hashtag #COVID19India OR #COVID19 and analyzed the data using Logistic Regression to find out how the emotion changed over time based on certain social factors
Źródło:
Multiple Criteria Decision Making; 2020, 15; 23-35
2084-1531
Pojawia się w:
Multiple Criteria Decision Making
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Classification and modelling of sound emission signals in selected tribosystems
Klasyfikacja i modelowanie sygnałów dżwięków w wybranych systemach tribologicznych
Autorzy:
Kekez, Michał
Jurczak, Wojciech
Ozimina, Dariusz
Powiązania:
https://bibliotekanauki.pl/articles/2055620.pdf
Data publikacji:
2021
Wydawca:
Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:
tribosystem
sound level
regression trees
random forest
system tribologiczny
poziom dźwięku
drzewa regresji
las losowy
Opis:
The paper presents an analysis of the sound level recorded during dry sliding friction conditions. Balls with a diameter of 6 mm placed on pins were made of 100Cr6 steel, silicon carbide (SiC), and corundum (Al2 O3 ), while rotating discs with a height of 6 mm and a diameter of 42 mm were made of 100Cr6 steel. Each pin and disc system was tested for two values of the relative humidity of the air (50 ± 5% and 90 ± 5%). Models of the A-sound level were developed using regression trees and random forest. The paper presents an analysis of the accuracy of the models obtained. Classifications of the six tests performed on the basis of sound level descriptors were also carried out.
W pracy przedstawiono analizę poziomu dźwięku zarejestrowanego podczas tarcia technicznie suchego w ruchu ślizgowym. Podczas sześciu testów tribologicznych stosowano próbkę wykonaną ze stali 100Cr6 oraz trzy przeciwpróbki, wykonane ze stali 100Cr6, węglika krzemu (SiC) i korundu (Al2 O3 ), przy czym każdy układ próbka – przeciwpróbka był testowany dla dwóch wartości wilgotności względnej powietrza (50 ± 5% i 90 ± 5%). Opracowano modele poziomu dźwięku A z użyciem drzew regresji i lasu losowego. W pracy zamieszczono analizę dokładności otrzymanych modeli. Została również przeprowadzona klasyfikacja sześciu wykonanych testów w oparciu o deskryptory poziomu dźwięku.
Źródło:
Tribologia; 2021, 297, 3; 19--26
0208-7774
Pojawia się w:
Tribologia
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Application of selected data mining techniques in unintentional accounting error detection
Autorzy:
Papík, Mário
Papíková, Lenka
Powiązania:
https://bibliotekanauki.pl/articles/22444352.pdf
Data publikacji:
2021
Wydawca:
Instytut Badań Gospodarczych
Tematy:
financial fraud
unintentional accounting errors
financial restatements
decision tree
classification and regression tree
random forest
Opis:
Research background: Even though unintentional accounting errors leading to financial restatements look like less serious distortion of publicly available information, it has been shown that financial restatements impacts on financial markets are similar to intentional fraudulent activities. Unintentional accounting errors leading to financial restatements then affect value of company shares in the short run which negatively impacts all shareholders. Purpose of the article: The aim of this manuscript is to predict unintentional accounting errors leading to financial restatements based on information from financial statements of companies. The manuscript analysis if financial statements include sufficient information which would allow detection of unintentional accounting errors. Methods: Method of classification and regression trees (decision tree) and random forest have been used in this manuscript to fulfill the aim of this manuscript. Data sample has consisted of 400 items from financial statements of 80 selected international companies. The results of developed prediction models have been compared and explained based on their accuracy, sensitivity, specificity, precision and F1 score. Statistical relationship among variables has been tested by correlation analysis. Differences between the group of companies with and without unintentional accounting error have been tested by means of Kruskal-Wallis test. Differences among the models have been tested by Levene and T-tests. Findings & value added: The results of the analysis have provided evidence that it is possible to detect unintentional accounting errors with high levels of accuracy based on financial ratios (rather than the Beneish variables) and by application of random forest method (rather than classification and regression tree method).
Źródło:
Equilibrium. Quarterly Journal of Economics and Economic Policy; 2021, 16, 1; 185-201
1689-765X
2353-3293
Pojawia się w:
Equilibrium. Quarterly Journal of Economics and Economic Policy
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Lasy losowe - ocena jakości prognostycznej cech
Random forests - evaluation of predictive accuracy
Autorzy:
Krętowska, M.
Powiązania:
https://bibliotekanauki.pl/articles/341027.pdf
Data publikacji:
2007
Wydawca:
Politechnika Białostocka. Oficyna Wydawnicza Politechniki Białostockiej
Tematy:
lasy losowe
analiza przeżywalności
bezwzględny błąd predykcji
random forest
survival analysis
predictive accuracy
explained variation
Opis:
W pracy bezwzględny błąd predykcji jest wykorzystywany do oceny jakości prognostycznej poszczególnych cech. Narzędzie prognostyczne - lasy losowe - jest konstruowane w celu uzyskania estymatora funkcji przeżycia. Jest on następnie porównywany z estymatorem funkcji przeżycia Kaplana-Meiera, utworzonym przy założeniu jednorodności populacji. Elementem składowym lasów są dipolowe drzewa przeżycia. Zastosowanie dipolowej funkcji kryterialnej pozwala wykorzystać niepełną informację o czasie zajścia porażki, pochodzącą z obserwacji obciętych.
In the paper, predictive accuracy measured as the absolute predictive error is used to evaluate the quality of covariates. The prognostic tool - random forests - is built to receive the aggregated survival function. The function is compared to Kaplan-Meier estimator of survival function with assumption that the population is homogenous. The induction of individual dipolar survival tree is based on minimization of a piece-wise linear function - dipolar criterion. The algorithm allows using the information from censored observations for which the exact survival time is unknown.
Źródło:
Zeszyty Naukowe Politechniki Białostockiej. Informatyka; 2007, 2; 67-77
1644-0331
Pojawia się w:
Zeszyty Naukowe Politechniki Białostockiej. Informatyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Prediction model of public houses’ heating systems:a comparison of support vector machine methodand random forest method
Model prognozowania systemów grzewczych budynków użyteczności publicznej: porównanie metody support vector machine i random forest
Autorzy:
Perekrest, Andrii
Chenchevoi, Vladimir
Chencheva, Olga
Kovalenko, Alexandr
Kushch-Zhyrko, Mykhailo
Kalizhanova, Aliya
Amirgaliyev, Yedilkhan
Powiązania:
https://bibliotekanauki.pl/articles/2174707.pdf
Data publikacji:
2022
Wydawca:
Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:
building heat supply
random forest
support vector machine
zaopatrzenie w ciepło budynku
metoda wektorów wspierających
Opis:
Data analysis and predicting play an important role in managing heat-supplying systems. Applying the models of predicting the systems’ parameters is possible for qualitative management, accepting appropriate decisions relating control that will be aimed at increasing energy efficiency and decreasing the amount of the consumed power source, diagnosing and defining non-typical processes in the functioning of the systems. The article deals with comparing two methods of ma-chine learning: random forest (RF) and support vector machine (SVM) for predicting the temperature of the heat-carrying agent in the heating system based on the data of electronic weather-dependent controller. The authors use the following parameters to compare the models: accuracy, source cost and the opportunity to interpret the results and non-obvious interrelations. The time spent for defining the optimal hyperparameters and conducting the SVM model training is deter-mined to exceed significantly the data of the RF parameter despite the close meanings of the root mean square error (RMSE). The change from 15-min data to once-a-minute ones is done to improve the RF model accuracy. RMSE of the RF model on the test data equals 0.41°С. The article studies the importance of the contribution of variables to the prediction accuracy.
Analiza danych i prognozowanie odgrywają ważną rolę w zarządzaniu systemami zaopatrzenia w ciepło. Wykorzystanie modeli do przewidywania parametrów systemu jest możliwe do zarządzania jakością, podejmowania odpowiednich decyzji sterujących, które będą miały na celu poprawę efektywności energetycznej i zmniejszenie ilości zużywanego źródła energii elektrycznej, diagnozowania i wykrywania nietypowych procesów w funkcjonowaniu systemu. W artykule porównano dwie metody uczenia maszynowego: Random Forest (RF) i Support Vector Machine (SVM) do przewidywania temperatury czynnika grzewczego w systemie grzewczym na podstawie danych elektronicznego regulatora pogodowego. Do porównania modeli autorzy wykorzystują następujące parametry: dokładność, koszt początkowy oraz możliwość interpretacji wyników i nieoczywistych zależności. Ustalono, że czas poświęcony na wyznaczenie optymalnych hiperparametrów i wytrenowanie modelu SVM znacznie przekracza dane parametru RF, pomimo zbliżonych wartości błędu średniokwadratowego (RMSE). Zmiana z danych 15-minutowych na dane raz na minutę została dokonana w celu poprawy dokładności modelu RF. RMSE modelu RF z danych testowych wynosi 0,41°C. W pracy zbadano znaczenie wkładu zmiennych w dokładność prognozy.
Źródło:
Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2022, 12, 3; 34--39
2083-0157
2391-6761
Pojawia się w:
Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Using a GEOBIA framework for integrating different data sources and classification methods in context of land use/land cover mapping
Autorzy:
Osmólska, A.
Hawryło, P.
Powiązania:
https://bibliotekanauki.pl/articles/145304.pdf
Data publikacji:
2018
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
mapa użytkowanych gruntów
mapa pokrycia terenu
mapa leśna
data fusion
random forest
supervised classification
Sentinel-2
Opis:
Land use/land cover (LULC) maps are important datasets in various environmental projects. Our aim was to demonstrate how GEOBIA framework can be used for integrating different data sources and classification methods in context of LULC mapping.We presented multi-stage semi-automated GEOBIA classification workflow created for LULC mapping of Tuszyma Forestry Management area based on multi-source, multi-temporal and multi-resolution input data, such as 4 bands- aerial orthophoto, LiDAR-derived nDSM, Sentinel-2 multispectral satellite images and ancillary vector data. Various classification methods were applied, i.e. rule-based and Random Forest supervised classification. This approach allowed us to focus on classification of each class ‘individually’ by taking advantage from all useful information from various input data, expert knowledge, and advanced machine-learning tools. In the first step, twelve classes were assigned in two-steps rule-based classification approach either vector-based, ortho- and vector-based or orthoand Lidar-based. Then, supervised classification was performed with use of Random Forest algorithm. Three agriculture-related LULC classes with vegetation alternating conditions were assigned based on aerial orthophoto and Sentinel-2 information. For classification of 15 LULC classes we obtained 81.3% overall accuracy and kappa coefficient of 0.78. The visual evaluation and class coverage comparison showed that the generated LULC layer differs from the existing land cover maps especially in relative cover of agriculture-related classes. Generally, the created map can be considered as superior to the existing data in terms of the level of details and correspondence to actual environmental and vegetation conditions that can be observed in RS images.
Źródło:
Geodesy and Cartography; 2018, 67, 1; 99-116
2080-6736
2300-2581
Pojawia się w:
Geodesy and Cartography
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Performance comparison of machine learning algotihms for predictive maintenance
Porównanie skuteczności algorytmów uczenia maszynowego dla konserwacji predykcyjnej
Autorzy:
Gęca, Jakub
Powiązania:
https://bibliotekanauki.pl/articles/1841332.pdf
Data publikacji:
2020
Wydawca:
Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:
machine learning
random forest
predictive maintenance
neural networks
uczenie maszynowe
las losowy
konserwacja predykcyjna
sieci neuronowe
Opis:
The consequences of failures and unscheduled maintenance are the reasons why engineers have been trying to increase the reliability of industrial equipment for years. In modern solutions, predictive maintenance is a frequently used method. It allows to forecast failures and alert about their possibility. This paper presents a summary of the machine learning algorithms that can be used in predictive maintenance and comparison of their performance. The analysis was made on the basis of data set from Microsoft Azure AI Gallery. The paper presents a comprehensive approach to the issue including feature engineering, preprocessing, dimensionality reduction techniques, as well as tuning of model parameters in order to obtain the highest possible performance. The conducted research allowed to conclude that in the analysed case, the best algorithm achieved 99.92% accuracy out of over 122 thousand test data records. In conclusion, predictive maintenance based on machine learning represents the future of machine reliability in industry.
Skutki związane z awariami oraz niezaplanowaną konserwacją to powody, dla których od lat inżynierowie próbują zwiększyć niezawodność osprzętu przemysłowego. W nowoczesnych rozwiązaniach obok tradycyjnych metod stosowana jest również tzw. konserwacja predykcyjna, która pozwala przewidywać awarie i alarmować o możliwości ich powstawania. W niniejszej pracy przedstawiono zestawienie algorytmów uczenia maszynowego, które można zastosować w konserwacji predykcyjnej oraz porównanie ich skuteczności. Analizy dokonano na podstawie zbioru danych Azure AI Gallery udostępnionych przez firmę Microsoft. Praca przedstawia kompleksowe podejście do analizowanego zagadnienia uwzględniające wydobywanie cech charakterystycznych, wstępne przygotowanie danych, zastosowanie technik redukcji wymiarowości, a także dostrajanie parametrów poszczególnych modeli w celu uzyskania najwyższej możliwej skuteczności. Przeprowadzone badania pozwoliły wskazać najlepszy algorytm, który uzyskał dokładność na poziomie 99,92%, spośród ponad 122 tys. rekordów danych testowych. Na podstawie tego można stwierdzić, że konserwacja predykcyjna prowadzona w oparciu o uczenie maszynowe stanowi przyszłość w zakresie podniesienia niezawodności maszyn w przemyśle.
Źródło:
Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2020, 10, 3; 32-35
2083-0157
2391-6761
Pojawia się w:
Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Extracting relevant predictors of the severity of mental illnesses from clinical information using regularisation regression models
Autorzy:
Kaushik, Sakshi
Sabharwal, Alka
Grover, Gurprit
Powiązania:
https://bibliotekanauki.pl/articles/2107145.pdf
Data publikacji:
2022-06-14
Wydawca:
Główny Urząd Statystyczny
Tematy:
adaptive LASSO
group LASSO
mental disorder
multicollinearity
random forest imputation
ridge regression
severity of an illness
Opis:
Mental disorders are common non-communicable diseases whose occurrence rises at epidemic rates globally. The determination of the severity of a mental illness has important clinical implications and it serves as a prognostic factor for effective intervention planning and management. This paper aims to identify the relevant predictors of the severity of mental illnesses (measured by psychiatric rating scales) from a wide range of clinical variables consisting of information on both laboratory test results and psychiatric factors . The laboratory test results collectively indicate the measurements of 23 components derived from vital signs and blood tests results for the evaluation of the complete blood count. The 8 psychiatric factors known to affect the severity of mental illnesses are considered, viz. the family history, course and onset of an illness, etc. Retrospective data of 78 patients diagnosed with mental and behavioural disorders were collected from the Lady Hardinge Medical College & Smt. S.K, Hospital in New Delhi, India. The observations missing in the data are imputed using the non-parametric random forest algorithm. The multicollinearity is detected based on the variance inflation factor. Owing to the presence of multicollinearity, regularisation techniques such as ridge regression and extensions of the least absolute shrinkage and selection operator (LASSO), viz. adaptive and group LASSO are used for fitting the regression model. Optimal tuning parameter λ is obtained through 13-fold cross-validation. It was observed that the coefficients of the quantitative predictors extracted by the adaptive LASSO and the group of predictors extracted by the group LASSO were comparable to the coefficients obtained through ridge regression.
Źródło:
Statistics in Transition new series; 2022, 23, 2; 129-152
1234-7655
Pojawia się w:
Statistics in Transition new series
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Sparse data classifier based on first-past-the-post voting system
Autorzy:
Cudak, Magdalena
Piech, Mateusz
Marcjan, Robert
Powiązania:
https://bibliotekanauki.pl/articles/27312911.pdf
Data publikacji:
2022
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
POI
machine learning
geospatial data
data science
first-past-the-post
random forest
point of interest
Opis:
A point of interest (POI) is a general term for objects that describe places from the real world. The concept of POI matching (i.e., determining whether two sets of attributes represent the same location) is not a trivial challenge due to the large variety of data sources. The representations of POIs may vary depending on the basis of how they are stored. A manual comparison of objects is not achievable in real time; therefore, there are multiple solutions for automatic merging. However, there is no yet the efficient solution solves the missing of the attributes. In this paper, we propose a multi-layered hybrid classifier that is composed of machine-learning and deep-learning techniques and supported by a first-past-the-post voting system. We examined different weights for the constituencies that were taken into consideration during a majority (or supermajority) decision. As a result, we achieved slightly higher accuracy than the best current model (random forest), which also is based on voting.
Źródło:
Computer Science; 2022, 23 (2); 277--296
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A Machine Learning Model for Improving Building Detection in Informal Areas: A Case Study of Greater Cairo
Autorzy:
Taha, Lamyaa Gamal El-deen
Ibrahim, Rania Elsayed
Powiązania:
https://bibliotekanauki.pl/articles/2055780.pdf
Data publikacji:
2022
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
multi-source image fusion
random forest
support vector machine
DEM extraction
unplanned unsafe areas
remote sensing
Opis:
Building detection in Ashwa’iyyat is a fundamental yet challenging problem, mainly because it requires the correct recovery of building footprints from images with high-object density and scene complexity. A classification model was proposed to integrate spectral, height and textural features. It was developed for the automatic detection of the rectangular, irregular structure and quite small size buildings or buildings which are close to each other but not adjoined. It is intended to improve the precision with which buildings are classified using scikit learn Python libraries and QGIS. WorldView-2 and Spot-5 imagery were combined using three image fusion techniques. The Grey-Level Co-occurrence Matrix was applied to determine which attributes are important in detecting and extracting buildings. The Normalized Digital Surface Model was also generated with 0.5-m resolution. The results demonstrated that when textural features of colour images were introduced as classifier input, the overall accuracy was improved in most cases. The results show that the proposed model was more accurate and efficient than the state-of-the-art methods and can be used effectively to extract the boundaries of small size buildings. The use of a classifier ensample is recommended for the extraction of buildings.
Źródło:
Geomatics and Environmental Engineering; 2022, 16, 2; 39--58
1898-1135
Pojawia się w:
Geomatics and Environmental Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Określanie lesistości Polesia Ukraińskiego na podstawie wyników klasyfikacji sezonowych obrazów kompozytowych Landsat 8 OLI
Estimation of forest cover in Ukrainian Polissia using classification of seasonal composite Landsat 8 OLI images
Autorzy:
Lakyda, P.
Myroniuk, V.
Bilous, A.
Boiko, S.
Powiązania:
https://bibliotekanauki.pl/articles/979663.pdf
Data publikacji:
2019
Wydawca:
Polskie Towarzystwo Leśne
Tematy:
lesnictwo
Ukraina
Polesie
lesistosc
teledetekcja
zdjecia satelitarne
satelita Landsat 8 OLI
forest cover
remote sensing
random forest
ikonos−2
ndvi
Opis:
Training dataset for modelling of forest cover was created after classification of multispectral satel− lite imagery IKONOS−2 with spatial resolution 3.2 m (acquisition date – 12.08.2011). As a result, we created binary forest cover map with 2 categories: ‘forest’ and ‘not−forest’. That allowed us to compute the tree canopy cover for each pixel of Landsat 8 OLI, using vector grid with cell size of 30×30 m. Classification model was developed using training dataset that included 17,000 observations, 10,000 of them represented results of IKONOS−2 classification. Aiming to avoid errors of agricultural lands inclusion into forest mask because of lack of data, additionally we collected about 7000 random observations with canopy cover 0% that had been evenly distributed within unforested area. Random Forest (RF) model we developed allowed us to create continuous map of forests within study area that represents in each pixel value of tree canopy closeness (0−100%). To convert it into a discrete map, we recoded all values less than 30% as ‘no data’ and values from 30 to 100% as 1. Forest mask for two selected administrative districts of Chernihiv region (NE Ukraine) was created after screening map from small pixel groups that covered area less than 0.5 ha. Obtained results were compared with Global Forest Change (GFC) map and proved that GFC data can be used for forest mapping with tree canopy closeness threshold 40%. On considerable areas of abandoned agricultural lands in the analysed regions of Ukraine, forest stands are formed by Scots pine, silver birch, black alder and aspen. Existence of such forests substantially increases (on 6−8%) the forested area of Gorodnya and Snovsk districts of Chernihiv region – comparing to official forest inventory data. However, such stands are not protected and have high risks to be severed by wildfires, illegal cuttings with aim to renew the agricultural production, by diseases, insects and other natural disturbances.
Źródło:
Sylwan; 2019, 163, 09; 754-764
0039-7660
Pojawia się w:
Sylwan
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Comparative study for deriving stagedischarge – sediment concentration relationships using soft computing techniques
Autorzy:
Sihag, P.
Sadikhani, M. R.
Vambol, V.
Vambol, S.
Prabhakar, A. K.
Sharma, N.
Powiązania:
https://bibliotekanauki.pl/articles/1818806.pdf
Data publikacji:
2021
Wydawca:
Stowarzyszenie Komputerowej Nauki o Materiałach i Inżynierii Powierzchni w Gliwicach
Tematy:
sediment load concentration
Baitarani river
M5P
random forest
ładunek osadu
stężenie
rzeka Baitarani
las losowy
Opis:
Purpose: Knowledge of sediment load carried by any river is essential for designing and planning of hydro power and irrigation projects. So the aim of this study is to develop and evaluating the best soft-computing-based model with M5P and Random Forest regressionbased techniques for computation of sediment using datasets of daily discharge, daily gauge and sediment load at the Champua gauging site of the Upper Baitarani river basin of India. Design/methodology/approach: Last few decades, the soft computing techniques based models have been successfully used in water resources modelling and estimation. In this study, the potential of tree based models are examined by developing and comparing sediment load prediction models, based on M5P tree and Random forest regression (RF). Several M5P and RF based models have been applied to a gauging site of the Baitarani River at Odisha, India. To evaluate the performance of the selected M5P and RF-based models, three most popular statistical parameters are selected such as coefficient of correlation, root mean square error and mean absolute error. Findings: A comparison of the results suggested that RF-based model could be applied successfully for the prediction of sediment load concentration with a relatively higher magnitude of prediction accuracy. In RF-based models Qt, Q(t-1), Q(t-2), S(t-1), S(t-2), Ht and H(t-1) combination based M10 model work superior than other combination based models. Another major outcome of this investigation is Qt, Q(t-1) and S(t-1) based model M4 works better than other input combination based models using M5P technique. The optimum input combination is Qt, Q(t-1) and S(t-1) for the prediction of sediment load concentration of the Baitarani River at Odisha, India. Research limitations/implications: The developed models were tested for Baitarani River at Odisha, India.
Źródło:
Journal of Achievements in Materials and Manufacturing Engineering; 2021, 104, 2; 57--76
1734-8412
Pojawia się w:
Journal of Achievements in Materials and Manufacturing Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Predicting the stability of open stopes using Machine Learning
Autorzy:
Szmigiel, Alicja
Apel, Derek B.
Powiązania:
https://bibliotekanauki.pl/articles/2201415.pdf
Data publikacji:
2022
Wydawca:
Główny Instytut Górnictwa
Tematy:
open stope
machine learning
logistic regression
random forest
system otwartych komór
uczenie maszynowe
regresja logistyczna
las losowy
Opis:
The Mathews stability graph method was presented for the first time in 1980. This method was developed to assess the stability of open stopes in different underground conditions, and it has an impact on evaluating the safety of underground excavations. With the development of technology and growing experience in applying computer sciences in various research disciplines, mining engineering could significantly benefit by using Machine Learning. Applying those ML algorithms to predict the stability of open stopes in underground excavations is a new approach that could replace the original graph method and should be investigated. In this research, a Potvin database that consisted of 176 historical case studies was passed to the two most popular Machine Learning algorithms: Logistic Regression and Random Forest, to compare their predicting capabilities. The results obtained showed that those algorithms can indicate the stability of underground openings, especially Random Forest, which, in examined data, performed slightly better than Logistic Regression.
Źródło:
Journal of Sustainable Mining; 2022, 21, 3; 241--248
2300-1364
2300-3960
Pojawia się w:
Journal of Sustainable Mining
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Space-Time-Frequency Machine Learning for Improved 4G/5G Energy Detection
Autorzy:
Wasilewska, Małgorzata
Bogucka, Hanna
Powiązania:
https://bibliotekanauki.pl/articles/226216.pdf
Data publikacji:
2020
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
spectrum sensing
cognitive radio
machine learning
energy detection
4G
LTE
5G
k-nearest neighbors
random forest
Opis:
In this paper, the future Fifth Generation (5G New Radio) radio communication system has been considered, coexisting and sharing the spectrum with the incumbent Fourth Generation (4G) Long-Term Evolution (LTE) system. The 4G signal presence is detected in order to allow for opportunistic and dynamic spectrum access of 5G users. This detection is based on known sensing methods, such as energy detection, however, it uses machine learning in the domains of space, time and frequency for sensing quality improvement. Simulation results for the considered methods: k-Nearest Neighbor sand Random Forest show that these methods significantly improves the detection probability.
Źródło:
International Journal of Electronics and Telecommunications; 2020, 66, 1; 217-223
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The increase of the performance of ultrafine coal flotation by using emulsified kerosene and the prediction of the flotation parameters by random forest and genetic algorithm
Poprawa efektywności flotacji węgla drobnoziarnistego przy wykorzystaniu emulsji naftowej oraz prognozowanie parametrów procesu flotacji przy użyciu metody lasów losowych oraz algorytmu genetycznego
Autorzy:
Oney, Ozcan
Powiązania:
https://bibliotekanauki.pl/articles/219716.pdf
Data publikacji:
2019
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
flotacja węgla drobnoziarnistego
emulsja naftowa
metoda lasów losowych
algorytm genetyczny
ultrafine coal flotation
emulsified kerosene
random forest
genetic algorithm
Opis:
In this study, emulsified kerosene was investigated to improve the flotation performance of ultrafine coal. For this purpose, NP-10 surfactant was used to form the emulsified kerosene. Results showed that the emulsified kerosene increased the recovery of ultrafine coal compared to kerosene. This study also revealed the effect of independent variables (emulsified collector dosage (ECD), frother dosage (FD) and impeller speed (IS)) on the responses (concentrate yield (γC %), concentrate ash content (%) and combustible matter recovery (ε %)) based on Random Forest (RF) model and Genetic Algorithm (GA). The proposed models for γC %, % and ε% showed satisfactory results with R2. The optimal values of three test variables were computed as ECD = 330.39 g/t, FD = 75.50 g/t and IS = 1644 rpm by using GA. Responses at these experimental optimal conditions were γC % = 58.51%, % = 21.7% and ε % = 82.83%. The results indicated that GA was a beneficial method to obtain the best values of the operating parameters. According to results obtained from optimal flotation conditions, kerosene consumption was reduced at the rate of about 20% with using the emulsified kerosene.
W pracy zbadano możliwość wykorzystania emulsji naftowej do poprawy efektywności flotacji węgla drobnoziarnistego. W tym celu wykorzystano środek powierzchniowo czynny NP.-10 do utworzenia emulsji naftowej. Badania wykazały, że zastosowanie nafty w formie emulsji poprawiło wskaźniki odzysku węgla w porównaniu do procesów z wykorzystaniem nafty. W pracy badano także wpływ zmiennych zależnych (dozowanie emulsji w kolektorze ECD, dozowanie środka pianotwórczego FD, prędkość wirnika IS na wyniki procesu (uzysk koncentratu (γC %), zawartość popiołów (%) i stopień odzysku materii palnej (ε%), w oparciu o metodę lasów losowych i algorytm genetyczny. Proponowane modele pozwoliły na uzyskanie zadawalających wyników dla wskaźników γC %, %, ε %, w odniesieniu do współczynnika R2. Optymalne wartości badanych zmiennych ECD = 330.39 g/t, FD = 75.50 g/t and IS = 1644 obrotów na minutę obliczono przy wykorzystaniu algorytmu genetycznego. Wyniki procesu prowadzonego w wa-runkach optymalnych, określonych eksperymentalnie to γC % = 58.81 %; % = 21.7 %; ε % = 82.83 %. Uzyskane wyniki wskazują, że wykorzystanie algorytmu genetycznego jest metodą umożliwiającąotrzymanie najkorzystniejszych wartości parametrów pracy. Na podstawie wyników flotacji uzyskanych w najkorzystniejszych warunkach stwierdzono, że zużycie nafty obniżone zostało o ok. 20% dzięki zastosowaniu nafty w postaci emulsji.
Źródło:
Archives of Mining Sciences; 2019, 64, 1; 119-130
0860-7001
Pojawia się w:
Archives of Mining Sciences
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Dimensionality Reduction for Probabilistic Neural Network in Medical Data Classification Problems
Autorzy:
Kusy, M.
Powiązania:
https://bibliotekanauki.pl/articles/226697.pdf
Data publikacji:
2015
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
probabilistic neural network
dimensionality reduction
feature selection
feature extraction
single decision tree
random forest
principal component analysis
prediction ability
Opis:
This article presents the study regarding the problem of dimensionality reduction in training data sets used for classification tasks performed by the probabilistic neural network (PNN). Two methods for this purpose are proposed. The first solution is based on the feature selection approach where a single decision tree and a random forest algorithm are adopted to select data features. The second solution relies on applying the feature extraction procedure which utilizes the principal component analysis algorithm. Depending on the form of the smoothing parameter, different types of PNN models are explored. The prediction ability of PNNs trained on original and reduced data sets is determined with the use of a 10-fold cross validation procedure.
Źródło:
International Journal of Electronics and Telecommunications; 2015, 61, 3; 289-300
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
An assessment of machine learning and data balancing techniques for evaluating downgrade truck crash severity prediction in Wyoming
Autorzy:
Ampadu, Vincent-Michael Kwesi
Haq, Muhammad Tahmidul
Ksaibati, Khaled
Powiązania:
https://bibliotekanauki.pl/articles/2176018.pdf
Data publikacji:
2022
Wydawca:
Fundacja Centrum Badań Socjologicznych
Tematy:
crash severity
performance
extreme gradient boosting tree
adaptive boosting tree
random forest
gradient boost decision tree
adaptive synthetic algorithm
Opis:
This study involved the investigation of various machine learning methods, including four classification tree-based ML models, namely the Adaptive Boosting tree, Random Forest, Gradient Boost Decision Tree, Extreme Gradient Boosting tree, and three non-tree-based ML models, namely Support Vector Machines, Multi-layer Perceptron and k-Nearest Neighbors for predicting the level of severity of large truck crashes on Wyoming road networks. The accuracy of these seven methods was then compared. The Final ROC AUC score for the optimized random forest model is 95.296 %. The next highest performing model was the k-NN with 92.780 %, M.L.P. with 87.817 %, XGBoost with 86.542 %, Gradboost with 74.824 %, SVM with 72.648 % and AdaBoost with 67.232 %. Based on the analysis, the top 10 predictors of severity were obtained from the feature importance plot. These may be classified into whether safety equipment was used, whether airbags were deployed, the gender of the driver and whether alcohol was involved.
Źródło:
Journal of Sustainable Development of Transport and Logistics; 2022, 7, 2; 6--24
2520-2979
Pojawia się w:
Journal of Sustainable Development of Transport and Logistics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Explicit and implicit description of the factors impact on the NO2 concentration in the traffic corridor
Jawny i niejawny opis wpływu czynników na stężenie NO2 w kanionie komunikacyjnym
Autorzy:
Kamińska, Joanna Amelia
Turek, Tomasz
Powiązania:
https://bibliotekanauki.pl/articles/204830.pdf
Data publikacji:
2020
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
nitrogen dioxide
traffic flow
meteorological conditions
random forest
linear regression
dwutlenek azotu
ruch uliczny
warunki meteorologiczne
losowy las
regresja liniowa
Opis:
High concentrations of nitrogen dioxide in the air, particularly in heavily urbanized areas, have an adverse effect on many aspects of residents’ health. A method is proposed for modelling daily average, minimal and maximal atmospheric NO2 concentrations in a conurbation, using two types of modelling: multiple linear regression (LR) an advanced data mining technique – Random Forest (RF). It was shown that Random Forest technique can be successfully applied to predict daily NO2 concentration based on data from 2015–2017 years and gives better fi t than linear models. The best results were obtained for predicting daily average NO2 values with R2=0.69 and RMSE=7.47 μg/m3. The cost of receiving an explicit, interpretable function is a much worse fit (R2 from 0.32 to 0.57). Verification of models on independent material from the first half of 2018 showed the correctness of the models with the mean average percentage error equal to 16.5% for RF and 28% for LR modelling daily average concentration. The most important factors were wind conditions and traffic flow. In prediction of maximal daily concentration, air temperature and air humidity take on greater importance. Prevailing westerly and south-westerly winds in Wrocław effectively implement the idea of ventilating the city within the studied intersection. Summarizing: when modeling natural phenomena, a compromise should be sought between the accuracy of the model and its interpretability.
Celem pracy jest zbadanie możliwości prognozowania dziennego stężenia NO2 za pomocą metody losowego lasu – RF i porównanie wyników z wielowymiarową regresją liniową (LR) w oparciu o ten sam zestaw danych. Ponadto zbadano wpływ zwiększenia interpretowalności modelu na jego dokładność. W pracy przedstawiono dwie metody modelowania dziennych wartości minimalnych, średnich oraz maksymalnych stężeń NO2 w aglomeracji miejskiej: wielowymiarowa regresja liniowa (LR) oraz losowy las (RF). Wykazano, że metoda Lasu Losowego (Random Forest) może być skutecznie wykorzystywana do przewidywania dziennych wartości stężenia NO2. Największą dokładność otrzymano dla przewidywania średnich wartości dziennych stężenia z R2=0.69 oraz RMSE=7.47 μg/m3. Kosztem otrzymania jawnej postaci funkcji w modeli liniowym (LR) jest znacząco niższa dokładność przewidywania wartości stężenia (R2 od 0.32 do 0.57). Weryfikacja modeli na niezależnym materiale z pierwszej połowy 2018 roku potwierdziła poprawność modeli ze średnim błędem względnym dla średnich wartości dobowych stężeń równym 16.5% dla RF oraz 28% dla LR. Największy wpływ na stężenia NO2 w kanionie komunikacyjnym ma wiatr oraz natężenie ruchu. W modelowaniu maksymalnych wartości dobowych nabierają znaczenia temperatura powietrza oraz wilgotność względna powietrza. Przeważające zachodnie i północno-zachodnie wiatry we Wrocławiu skutecznie realizują koncepcję przewietrzania miasta w zakresie rozważanego skrzyżowania.
Źródło:
Archives of Environmental Protection; 2020, 46, 1; 93-99
2083-4772
2083-4810
Pojawia się w:
Archives of Environmental Protection
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Data mining methods for prediction of air pollution
Autorzy:
Siwek, K.
Osowski, S.
Powiązania:
https://bibliotekanauki.pl/articles/330775.pdf
Data publikacji:
2016
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
computational intelligence
feature selection
neural network
random forest
air pollution forecasting
inteligencja obliczeniowa
selekcja cech
sieć neuronowa
lasy losowe
zanieczyszczenie powietrza
Opis:
The paper discusses methods of data mining for prediction of air pollution. Two tasks in such a problem are important: generation and selection of the prognostic features, and the final prognostic system of the pollution for the next day. An advanced set of features, created on the basis of the atmospheric parameters, is proposed. This set is subject to analysis and selection of the most important features from the prediction point of view. Two methods of feature selection are compared. One applies a genetic algorithm (a global approach), and the other—a linear method of stepwise fit (a locally optimized approach). On the basis of such analysis, two sets of the most predictive features are selected. These sets take part in prediction of the atmospheric pollutants PM10, SO2, NO2 and O3. Two approaches to prediction are compared. In the first one, the features selected are directly applied to the random forest (RF), which forms an ensemble of decision trees. In the second case, intermediate predictors built on the basis of neural networks (the multilayer perceptron, the radial basis function and the support vector machine) are used. They create an ensemble integrated into the final prognosis. The paper shows that preselection of the most important features, cooperating with an ensemble of predictors, allows increasing the forecasting accuracy of atmospheric pollution in a significant way.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2016, 26, 2; 467-478
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies