Temat: Random Forest - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: A comparative study on performance of basic and ensemble classifiers with various datasets
Autorzy:: Gunakala, Archana
Shahid, Afzal Hussain
Powiązania:: https://bibliotekanauki.pl/articles/30148255.pdf
Data publikacji:: 2023
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: classification
Naïve Bayes
neural network
Support Vector Machine
Decision Tree
ensemble learning
Random Forest
Opis:: Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen based on the model's performance and execution time. This paper compares and analyzes the performance of basic as well as ensemble classifiers utilizing 10-fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from Kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01%. The proposed ensemble combinations outperformed the conven¬tional models for few datasets.
Źródło:: Applied Computer Science; 2023, 19, 1; 107-132
1895-3735
2353-6977
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: A System for Filling Store Displays: Pitting a Single Model against a Set of Demand Forecasting Models
System zapełnienia ekspozycji sklepowych: pojedynczy model a zespół modeli prognozowania popytu
Autorzy:: Myna, Artur
Myna, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/2206342.pdf
Data publikacji:: 2023
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: Extreme Gradient Boosting
logistic regression
random forest
regresja logistyczna
las losowy
Opis:: The aim of the paper was to develop the concept of retail display space allocation as a system and to assess the quality of very slow-moving products demand forecasting models (that have not yet been used by retail companies in Poland) as its key subsystem. Forecasts were made using the example of a clothing company. The quality of these models was assessed using the Weighted Mean Absolute Percentage Error. The first step was to build the individual models. Later, the authors built separate models for brick-and-mortar and online stores as well as brands, creating a set of six models. The findings show that the classification approach for very slow movers provides as precise results as the regression approach. No single model or set of models (built with a particular machine learning method) could be identified that made the best demand forecasts for brick-and-mortar stores, as statistical tests generally did not confirm the significance of the differences between the median forecasts.
Celem artykułu jest opracowanie koncepcji zapełnienia ekspozycji sklepowych jako sys- temu oraz ocena jakości modeli prognozowania popytu (które w Polsce nie są jeszcze wykorzystywane przez sieci handlowe) bardzo wolno rotujących produktów jako jego kluczowego podsystemu. Jakość modeli oceniono za pomocą miary Weighted Mean Absolute Percentage Error na różnych poziomach szczegółowości: dla całej sieci sprzedaży i określonego miesiąca oraz na „na przecięciu” sklepu, produk- tu i rozmiaru produktu. Najpierw zbudowano pojedyncze modele, następnie zaś odrębne modele dla sklepów stacjonarnych i internetowych, jak również marek, tworząc zespół sześciu modeli. Poprawę dopasowania modeli osiągnięto tylko dla sklepów internetowych. Wyniki pracy wskazują, że podejście klasyfikacyjne dla bardzo wolno rotujących produktów charakteryzują równie precyzyjne wyniki pro- gnoz jak podejście regresyjne. Nie można wskazać jednego modelu lub zespołu modeli (zbudowanego określoną metodą uczenia maszynowego), który wykonał najlepsze prognozy popytu dla sklepów sta- cjonarnych, gdyż istotności różnic median prognoz na ogół nie potwierdzono testami statystycznymi.
Źródło:: Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu; 2023, 67, 2; 96-106
1899-3192
Pojawia się w:: Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: An Approach to License Plate Recognition in Real Time Using Multi-stage Computational Intelligence Classifier
Autorzy:: Kekez, Michał
Powiązania:: https://bibliotekanauki.pl/articles/27311914.pdf
Data publikacji:: 2023
Wydawca:: Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:: car license plates
LPR
ANPR
OCR
image processing
neural network
Random Forest
Opis:: Automatic car license plate recognition (LPR) is widely used nowadays. It involves plate localization in the image, character segmentation and optical character recognition. In this paper, a set of descriptors of image segments (characters) was proposed as well as a technique of multi-stage classification of letters and digits using cascade of neural network and several parallel Random Forest or classification tree or rule list classifiers. The proposed solution was applied to automated recognition of number plates which are composed of capital Latin letters and Arabic numerals. The paper presents an analysis of the accuracy of the obtained classifiers. The time needed to build the classifier and the time needed to classify characters using it are also presented.
Źródło:: International Journal of Electronics and Telecommunications; 2023, 69, 2; 275--280
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Assessing the efficiency of a random forest regression model for estimating water quality indicators
Autorzy:: Zavareh, Maryam
Maggioni, Viviana
Zhang, Xinxuan
Powiązania:: https://bibliotekanauki.pl/articles/27810498.pdf
Data publikacji:: 2023
Wydawca:: Instytut Meteorologii i Gospodarki Wodnej - Państwowy Instytut Badawczy
Tematy:: Random Forest
water quality
hydrometeorological information
Opis:: This work evaluates the efficiency of Random Forest (RF) regression for predicting water quality indicators and investigates factors affecting water quality in 11 watersheds in Virginia, District of Columbia, and Maryland. Ten years of daily water quality data along with hydro-meteorological information (such as precipitation) and watershed physiology and characteristics (e.g., size, soil type, land use) are used to predict dissolved oxygen (DO), specific conductivity (K), and turbidity (Tu) across the selected watersheds. The RF regression model is developed for six scenarios, with an increasing number of predictors introduced in each scenario. The first scenario contains the smallest amount of information (water quality indicators DO, K and Tu), while scenario 6 contains all the available variables. The RF model is evaluated based on three statistical metrics: the relative root mean square error, the correlation coefficient, and the percentage of variance explained. In addition, the degree of importance for each predictor is used to rank their importance within each scenario. The model shows excellent performance for DO as the predicted variable. The model predicting K slightly outperforms the one predicting Tu. Scenario 4 (built based on water quality indicators, hydro-meteorological data, watershed physiology and land cover information) provided the best tradeoff between performance and efficiency (quantified in terms of the amount of information needed to develop the model). In conclusion, based on the RF model, land cover plays a significant role in predicting water quality indicators. In addition, the developed RF regression model is adaptable to watersheds in this region over a range of climates.
Źródło:: Meteorology Hydrology and Water Management. Research and Operational Applications; 2023, 11, 2; 1--18
2299-3835
2353-5652
Pojawia się w:: Meteorology Hydrology and Water Management. Research and Operational Applications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Developing a data-driven soft sensor to predict silicate impurity in iron ore flotation concentrate
Autorzy:: Pural, Yusuf Enes
Powiązania:: https://bibliotekanauki.pl/articles/24148677.pdf
Data publikacji:: 2023
Wydawca:: Politechnika Wrocławska. Oficyna Wydawnicza Politechniki Wrocławskiej
Tematy:: soft sensor
machine learning
random forest
multi-layer perceptron
flotation
grade estimation
Opis:: Soft sensors are mathematical models that estimate the value of a process variable that is difficult or expensive to measure directly. They can be based on first principle models, data-based models, or a combination of both. These models are increasingly used in mineral processing to estimate and optimize important performance parameters such as mill load, mineral grades, and particle size. This study investigates the development of a data-driven soft sensor to predict the silicate content in iron ore reverse flotation concentrate, a crucial indicator of plant performance. The proposed soft sensor model employs a dataset obtained from Kaggle, which includes measurements of iron and silicate content in the feed to the plant, reagent dosages, weight and pH of pulp, as well as the amount of air and froth levels in the flotation units. To reduce the dimensionality of the dataset, Principal Component Analysis, an unsupervised machine learning method, was applied. The soft sensor model was developed using three machine learning algorithms, namely, Ridge Regression, Multi-Layer Perceptron, and Random Forest. The Random Forest model, created with non-reduced data, demonstrated superior performance, with an R-squared value of 96.5% and a mean absolute error of 0.089. The results suggest that the proposed soft sensor model can accurately predict the silicate content in the iron ore flotation concentrate using machine learning algorithms. Moreover, the study highlights the importance of selecting appropriate algorithms for soft sensor developments in mineral processing plants.
Źródło:: Physicochemical Problems of Mineral Processing; 2023, 59, 5; art. no. 169823
1643-1049
2084-4735
Pojawia się w:: Physicochemical Problems of Mineral Processing
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Development of Flood-Hazard-Mapping Model Using Random Forest and Frequency Ratio in Sumedang Regency, West Java, Indonesia
Autorzy:: Ismanto, Rido Dwi
Fitriana, Hana Listi
Manalu, Johanes
Purboyo, Alvian Aji
Prasasti, Indah
Powiązania:: https://bibliotekanauki.pl/articles/27314279.pdf
Data publikacji:: 2023
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: flood-susceptibility assessment
random forest
frequency ratio
Sumedang
remote sensing
Opis:: Flooding, often triggered by heavy rainfall, is a common natural disaster in Indonesia, and is the third most common type of disaster in Sumedang Regency. Hence, flood-susceptibility mapping is essential for flood management. The primary challenge in this lies in the complex, non-linear relationships between indices and risk levels. To address this, the application of random forest (RF) and frequency ratio (FR) methods has been explored. Ten flood-conditioning factors were determined from the references: the distance from a river, elevation, geology, geomorphology, lithology, land use/land cover, rainfall, slope, soil type, and topographic wetness index (TWI). The 35 flood locations from the flood-inventory map were selected, and the remaining 18 flood locations were used for justifying the outcomes. The flooded areas from the RF model were 28.39%; the rest (71.61%) were non-flooded areas. Also, the flooded areas from the FR method were 8.02%, and the non-flooded areas were 91.98%. The AUC for both methods was a similar value – 83.0%. This result is quite accurate and can be used by policymakers to prevent and manage future flooding in the Sumedang area. These results can also be used as materials for updating existing flood-susceptibility maps.
Źródło:: Geomatics and Environmental Engineering; 2023, 17, 6; 129--157
1898-1135
Pojawia się w:: Geomatics and Environmental Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Integrating Vegetation Indices and Spectral Features for Vegetation Mapping from Multispectral Satellite Imagery Using AdaBoost and Random Forest Machine Learning Classifiers
Autorzy:: Saini, Rashmi
Powiązania:: https://bibliotekanauki.pl/articles/2174656.pdf
Data publikacji:: 2023
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: ensemble classifiers
Machine Learning
Random Forest
AdaBoost
vegetation mapping
vegetation indices
Opis:: Vegetation mapping is an active research area in the domain of remote sensing. This study proposes a methodology for the mapping of vegetation by integrating several vegetation indices along with original spectral bands. The Land Use Land Cover classification was performed by two powerful Machine Learning techniques, namely Random Forest and AdaBoost. The Random Forest algorithm works on the concept of building multiple decision trees for the final prediction. The other Machine Learning technique selected for the classification is AdaBoost (adaptive boosting), converts a set of weak learners into strong learners. Here, multispectral satellite data of Dehradun, India, was utilised. The results demonstrate an increase of 3.87% and 4.32% after inclusion of selected vegetation indices by Random Forest and AdaBoost respectively. An Overall Accuracy (OA) of 91.23% (kappa value of 0.89) and 88.59% (kappa value of 0.86) was obtained by means of the Random Forest and AdaBoost classifiers respectively. Although Random Forest achieved greater OA as compared to AdaBoost, interestingly AdaBoost provided better class-specific accuracy for the Shrubland class compared to Random Forest. Furthermore, this study also evaluated the importance of each individual feature used in the classification. Results demonstrated that the NDRE, GNDVI, and RTVIcore vegetation indices, and spectral bands (NIR, and Red-Edge), obtained higher importance scores.
Źródło:: Geomatics and Environmental Engineering; 2023, 17, 1; 57--74
1898-1135
Pojawia się w:: Geomatics and Environmental Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Semantic Segmentation of Diseases in Mushrooms using Enhanced Random Forest
Autorzy:: Yacharam, Rakesh Kumar
Sekhar, Dr. V. Chandra
Powiązania:: https://bibliotekanauki.pl/articles/31339414.pdf
Data publikacji:: 2023
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Instytut Informatyki Technicznej
Tematy:: mushroom diseases
semantic segmentation
computer aided
Machine Learning
significant feature extraction
Random Forest classifier
Opis:: Mushrooms are a rich source of antioxidants and nutritional values. Edible mushrooms, however, are susceptible to various diseases such as dry bubble, wet bubble, cobweb, bacterial blotches, and mites. Farmers face significant production losses due to these diseases affecting mushrooms. The manual detection of these diseases relies on expertise, knowledge of diseases, and human effort. Therefore, there is a need for computer-aided methods, which serve as optimal substitutes for detecting and segmenting diseases. In this paper, we propose a semantic segmentation approach based on the Random Forest machine learning technique for the detection and segmentation of mushroom diseases. Our focus lies in extracting a combination of different features, including Gabor, Bouda, Kayyali, Gaussian, Canny edge, Roberts, Sobel, Scharr, Prewitt, Median, and Variance. We employ constant mean-variance thresholding and the Pearson correlation coefficient to extract significant features, aiming to enhance computational speed and reduce complexity in training the Random Forest classifier. Our results indicate that semantic segmentation based on Random Forest outperforms other methods such as Support Vector Machine (SVM), Naïve Bayes, K-means, and Region of Interest in terms of accuracy. Additionally, it exhibits superior precision, recall, and F1 score compared to SVM. It is worth noting that deep learning-based semantic segmentation methods were not considered due to the limited availability of diseased mushroom images.
Źródło:: Machine Graphics & Vision; 2023, 32, 2; 129-146
1230-0535
2720-250X
Pojawia się w:: Machine Graphics & Vision
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: A Machine Learning Model for Improving Building Detection in Informal Areas: A Case Study of Greater Cairo
Autorzy:: Taha, Lamyaa Gamal El-deen
Ibrahim, Rania Elsayed
Powiązania:: https://bibliotekanauki.pl/articles/2055780.pdf
Data publikacji:: 2022
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: multi-source image fusion
random forest
support vector machine
DEM extraction
unplanned unsafe areas
remote sensing
Opis:: Building detection in Ashwa’iyyat is a fundamental yet challenging problem, mainly because it requires the correct recovery of building footprints from images with high-object density and scene complexity. A classification model was proposed to integrate spectral, height and textural features. It was developed for the automatic detection of the rectangular, irregular structure and quite small size buildings or buildings which are close to each other but not adjoined. It is intended to improve the precision with which buildings are classified using scikit learn Python libraries and QGIS. WorldView-2 and Spot-5 imagery were combined using three image fusion techniques. The Grey-Level Co-occurrence Matrix was applied to determine which attributes are important in detecting and extracting buildings. The Normalized Digital Surface Model was also generated with 0.5-m resolution. The results demonstrated that when textural features of colour images were introduced as classifier input, the overall accuracy was improved in most cases. The results show that the proposed model was more accurate and efficient than the state-of-the-art methods and can be used effectively to extract the boundaries of small size buildings. The use of a classifier ensample is recommended for the extraction of buildings.
Źródło:: Geomatics and Environmental Engineering; 2022, 16, 2; 39--58
1898-1135
Pojawia się w:: Geomatics and Environmental Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: A Small Wind Turbine Output Model for Spatially Constrained Remote Island Micro-Grids
Autorzy:: Žigman, D.
Meštrović, K.
Tomiša, T.
Powiązania:: https://bibliotekanauki.pl/articles/2172468.pdf
Data publikacji:: 2022
Wydawca:: Uniwersytet Morski w Gdyni. Wydział Nawigacyjny
Tematy:: wind turbine
small wind turbine
decision tree model
artificial neural network model
random forest model
micro-grids
spatially constrained remote Island micro-grids
remote Island micro-grid
Opis:: Modelling operation of the power supply system for remote island communities is essential for its operation, as well as a survival of a modern society settled in challenging conditions. Micro-grid emerges as a proper solution for a sustainable development of a spatially constrained remote island community, while at the same time reflecting the power requirements of similar maritime subjects, such as large vessels and fleets. Here we present research results in predictive modelling the output of a small wind turbine, as a component of a remote island micro-grid. Based on a month-long experimental data and the machine learning-based predictive model development approach, three candidate models of a small wind turbine output were developed, and assessed on their performance based on an independent set of experimental data. The Random Forest Model out performed competitors (Decision Tree Model and Artificial Neural Network Model), emerging as a candidate methodology for the all-year predictive model development, as a later component of the over-all remote island micro-grid model.
Źródło:: TransNav : International Journal on Marine Navigation and Safety of Sea Transportation; 2022, 16, 1; 143--146
2083-6473
2083-6481
Pojawia się w:: TransNav : International Journal on Marine Navigation and Safety of Sea Transportation
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: A Study on the Optimization of Metalloid Contents of Fe-Si-B-C Based Amorphous Soft Magnetic Materials Using Artificial Intelligence Method
Autorzy:: Choi, Young-Sin
Kwon, Do-Hun
Lee, Min_Woo
Cha, Eun-Ji
Jeon, Junhyub
Lee, Seok-Jae
Kim, Jongryoul
Kim, Hwi-Jun
Powiązania:: https://bibliotekanauki.pl/articles/2174571.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: Fe-based amorphous
soft magnetic properties
artificial intelligence
machine learning
random forest regression
Opis:: The soft magnetic properties of Fe-based amorphous alloys can be controlled by their compositions through alloy design. Experimental data on these alloys show some discrepancy, however, with predicted values. For further improvement of the soft magnetic properties, machine learning processes such as random forest regression, k-nearest neighbors regression and support vector regression can be helpful to optimize the composition. In this study, the random forest regression method was used to find the optimum compositions of Fe-Si-B-C alloys. As a result, the lowest coercivity was observed in Fe80.5Si3.63B13.54C2.33 at.% and the highest saturation magnetization was obtained Fe81.83Si3.63B12.63C1.91at.% with R2 values of 0.74 and 0.878, respectively.
Źródło:: Archives of Metallurgy and Materials; 2022, 67, 4; 1459--1463
1733-3490
Pojawia się w:: Archives of Metallurgy and Materials
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: An anomaly detection method based on random convolutional kernel and isolation forest for equipment state monitoring
Autorzy:: Shu, Xinhao
Zhang, Shigang
Li, Yue
Chen, Mengqiao
Powiązania:: https://bibliotekanauki.pl/articles/2200934.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Polskie Naukowo-Techniczne Towarzystwo Eksploatacyjne PAN
Tematy:: anomaly detection
random convolutional kernel
isolation forest
multi-dimensional time
series
equipment state monitoring
Opis:: Anomaly detection plays an essential role in health monitoring and reliability assurance of complex system. However, previous researches suffer from distraction by outliers in training and extensively relying on empiric-based feature engineering, leading to many limitations in the practical application of detection methods. In this paper, we propose an unsupervised anomaly detection method that combines random convolution kernels with isolation forest to tackle the above problems in equipment state monitoring. The random convolution kernels are applied to generate cross-dimensional and multi-scale features for multi-dimensional time series, with combining the time series decomposing method to select abnormally sensitive features for automatic feature extraction. Then, anomaly detection is performed on the obtained features using isolation forests with low requirements for purity of training sample. The verification and comparison on different types of datasets show the performance of the proposed method surpass the traditional methods in accuracy and applicability.
Źródło:: Eksploatacja i Niezawodność; 2022, 24, 4; 758--770
1507-2711
Pojawia się w:: Eksploatacja i Niezawodność
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: An assessment of machine learning and data balancing techniques for evaluating downgrade truck crash severity prediction in Wyoming
Autorzy:: Ampadu, Vincent-Michael Kwesi
Haq, Muhammad Tahmidul
Ksaibati, Khaled
Powiązania:: https://bibliotekanauki.pl/articles/2176018.pdf
Data publikacji:: 2022
Wydawca:: Fundacja Centrum Badań Socjologicznych
Tematy:: crash severity
performance
extreme gradient boosting tree
adaptive boosting tree
random forest
gradient boost decision tree
adaptive synthetic algorithm
Opis:: This study involved the investigation of various machine learning methods, including four classification tree-based ML models, namely the Adaptive Boosting tree, Random Forest, Gradient Boost Decision Tree, Extreme Gradient Boosting tree, and three non-tree-based ML models, namely Support Vector Machines, Multi-layer Perceptron and k-Nearest Neighbors for predicting the level of severity of large truck crashes on Wyoming road networks. The accuracy of these seven methods was then compared. The Final ROC AUC score for the optimized random forest model is 95.296 %. The next highest performing model was the k-NN with 92.780 %, M.L.P. with 87.817 %, XGBoost with 86.542 %, Gradboost with 74.824 %, SVM with 72.648 % and AdaBoost with 67.232 %. Based on the analysis, the top 10 predictors of severity were obtained from the feature importance plot. These may be classified into whether safety equipment was used, whether airbags were deployed, the gender of the driver and whether alcohol was involved.
Źródło:: Journal of Sustainable Development of Transport and Logistics; 2022, 7, 2; 6--24
2520-2979
Pojawia się w:: Journal of Sustainable Development of Transport and Logistics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Application of machine learning tools for seismic reservoir characterization study of porosity and saturation type
Zastosowanie metod uczenia maszynowego do charakterystyki porowatości i typu nasycenia przy użyciu atrybutów sejsmicznych
Autorzy:: Topór, Tomasz
Sowiżdżał, Krzysztof
Powiązania:: https://bibliotekanauki.pl/articles/2143329.pdf
Data publikacji:: 2022
Wydawca:: Instytut Nafty i Gazu - Państwowy Instytut Badawczy
Tematy:: machine learning
random forest
XGBoost
seismic attributes
reservoir properties prediction
uczenie maszynowe
lasy losowe
drzewa wzmocnione gradientowo
atrybuty sejsmiczne
predykcja własności zbiornikowych
Opis:: The application of machine learning (ML) tools and data-driven modeling became a standard approach for solving many problems in exploration geology and contributed to the discovery of new reservoirs. This study explores an application of machine learning ensemble methods – random forest (RF) and extreme gradient boosting (XGBoost) to derive porosity and saturation type (gas/water) in multihorizon sandstone formations from Miocene deposits of the Carpathian Foredeep. The training of ML algorithms was divided into two stages. First, the RF algorithm was used to compute porosity based on seismic attributes and well location coordinates. The obtained results were used as an extra feature to saturation type modeling using the XGBoost algorithm. The XGBoost was run with and without well location coordinates to evaluate the influence of the spatial information for the modeling performance. The hyperparameters for each model were tuned using the Bayesian optimization algorithm. To check the training models' robustness, 10-fold cross-validation was performed. The results were evaluated using standard metrics, for regression and classification, on training and testing sets. The residual mean standard error (RMSE) for porosity prediction with RF for training and testing was close to 0.053, providing no evidence of overfitting. Feature importance analysis revealed that the most influential variables for porosity prediction were spatial coordinates and seismic attributes sweetness. The results of XGBoost modeling (variant 1) demonstrated that the algorithm could accurately predict saturation type despite the class imbalance issue. The sensitivity for XGBoost on training and testing data was high and equaled 0.862 and 0.920, respectively. The XGBoost model relied on computed porosity and spatial coordinates. The obtained sensitivity results for both training and testing sets dropped significantly by about 10% when well location coordinates were removed (variant 2). In this case, the three most influential features were computed porosity, seismic amplitude contrast, and iso-frequency component (15 Hz) attribute. The obtained results were imported to Petrel software to present the spatial distribution of porosity and saturation type. The latter parameter was given with probability distribution, which allows for identifying potential target zones enriched in gas.
Metody uczenia maszynowego stanowią obecnie rutynowe narzędzie wykorzystywane przy rozwiązywaniu wielu problemów w geologii poszukiwawczej i przyczyniają się do odkrycia nowych złóż. Prezentowana praca pokazuje zastosowanie dwóch algorytmów uczenia maszynowego – lasów losowych (RF) i drzew wzmocnionych gradientowo (XGBoost) do wyznaczenia porowatości i typu nasycenia (gaz/woda) w formacjach piaskowców będących potencjalnymi horyzontami gazonośnymi w mioceńskich osadach zapadliska przedkarpackiego. Proces uczenia maszynowego został podzielony na dwa etapy. W pierwszym etapie użyto RF do obliczenia porowatości na podstawie danych pochodzących z atrybutów sejsmicznych oraz współrzędnych lokalizacji otworów. Uzyskane wyniki zostały wykorzystane jako dodatkowa cecha przy modelowaniu typu nasycenia z zastosowaniem algorytmu XGBoost. Modelowanie za pomocą XGBoost został przeprowadzone w dwóch wariantach – z wykorzystaniem lokalizacji otworów oraz bez nich w celu oceny wpływu informacji przestrzennych na wydajność modelowania. Proces strojenia hiperparametrów dla poszczególnych modeli został przeprowadzony z wykorzystaniem optymalizacji Bayesa. Wyniki procesu modelowania zostały ocenione na zbiorach treningowym i testowym przy użyciu standardowych metryk wykorzystywanych do rozwiązywania problemów regresyjnych i klasyfikacyjnych. Dodatkowo, aby wzmocnić wiarygodność modeli treningowych, przeprowadzona została 10-krotna kroswalidacja. Pierwiastek błędu średniokwadratowego (RMSE) dla wymodelowanej porowatości na zbiorach treningowym i testowym był bliski 0,053 co wskazuje na brak nadmiernego dopasowania modelu (ang. overfitting). Analiza istotności cech ujawniła, że zmienną najbardziej wpływającą na prognozowanie porowatości były współrzędne lokalizacji otworów oraz atrybut sejsmiczny sweetness. Wyniki modelowania XGBoost (wariant 1) wykazały, że algorytm jest w stanie dokładnie przewidywać typ nasycenia pomimo problemu z nierównowagą klas. Czułość wykrywania potencjalnych stref gazowych w przypadku modelu XGBoost była wysoka zarówno dla zbioru treningowego, jak i testowego (0,862 i 0,920). W swoich predykcjach model opierał się głównie na wyliczonej porowatości oraz współrzędnych otworów. Czułość dla uzyskanych wyników na zbiorze treningowym i testowym spadła o około 10%, gdy usunięto współrzędne lokalizacji otworów (wariant 2 XGBoost). W tym przypadku trzema najważniejszymi cechami były obliczona porowatość oraz atrybut sejsmiczny amplitude contrast i atrybut iso-frequency component (15 Hz). Uzyskane wyniki zostały zaimportowane do programu Petrel, aby przedstawić przestrzenny rozkład porowatości i typu nasycenia. Ten ostatni parametr został przedstawiony wraz z rozkładem prawdopodobieństwa, co dało wgląd w strefy o najwyższym potencjale gazowym.
Źródło:: Nafta-Gaz; 2022, 78, 3; 165-175
0867-8871
Pojawia się w:: Nafta-Gaz
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Artificial Intelligence Based Flood Forecasting for River Hunza at Danyor Station in Pakistan
Autorzy:: Yaseen, Muhammad Waseem
Awais, Muhammad
Riaz, Khuram
Rasheed, Muhammad Babar
Waqar, Muhammad
Rasheed, Sajid
Powiązania:: https://bibliotekanauki.pl/articles/31340346.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Instytut Budownictwa Wodnego PAN
Tematy:: hydrometeorology
random forest
support vector
multilayer perceptron
machine learning
flood forecasting
Opis:: Floods can cause significant problems for humans and can damage the economy. Implementing a reliable flood monitoring warning system in risk areas can help to reduce the negative impacts of these natural disasters. Artificial intelligence algorithms and statistical approaches are employed by researchers to enhance flood forecasting. In this study, a dataset was created using unique features measured by sensors along the Hunza River in Pakistan over the past 31 years. The dataset was used for classification and regression problems. Two types of machine learning algorithms were tested for classification: classical algorithms (Random Forest, RF and Support Vector Classifier, SVC) and deep learning algorithms (Multi-Layer Perceptron, MLP). For the regression problem, the result of MLP and Support Vector Regression (SVR) algorithms were compared based on their mean square, root mean square and mean absolute errors. The results obtained show that the accuracy of the RF classifier is 0.99, while the accuracies of the SVC and MLP methods are 0.98; moreover, in the case of flood prediction, the SVR algorithm outperforms the MLP approach.
Źródło:: Archives of Hydro-Engineering and Environmental Mechanics; 2022, 69, 1; 59-77
1231-3726
Pojawia się w:: Archives of Hydro-Engineering and Environmental Mechanics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "Random Forest" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język