Temat: missing data - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Comparative analysis of methods for hourly electricity demand forecasting in the absence of data - a case study
Analiza porównawcza metod prognozowania godzinnego zapotrzebowania na energię elektryczną przy brakach w danych - studium przypadku
Autorzy:: Zawadzki, Jan
Powiązania:: https://bibliotekanauki.pl/articles/2194900.pdf
Data publikacji:: 2023
Wydawca:: Akademia Bialska Nauk Stosowanych im. Jana Pawła II w Białej Podlaskiej
Tematy:: forecasting
missing data
time series
high frequency
Opis:: Scope and purpose of work: This paper examines the impact of the number of gaps in data, the analytical form, and the model type selection criterion on the accuracy of interpolation and extrapolation forecasts for hourly data. Materials and methods: Forecasts were developed on the basis of predictors that are based on: classical time series forecasting models and regression time series forecasting models, hybrid time series forecasting models and hybrid regression forecasting models for uncleared series, and exponential smoothing models for cleared series of two or three types of seasonal fluctuations, with minimum estimates of errors in interpolation or extrapolation forecasts. Results: Adaptive and hybrid regression models have proved to have the most favorable predictive properties. Most hybrid time series models for systematic and non-systematic gaps and for both analytical forms are single models that generally describe fluctuations within a 24-hour cycle. Conclusions: The lowest estimators of prediction errors involving interpolation were obtained for exponential smoothing models, followed by hybrid regression models. A reverse sequence was obtained for extrapolative forecasting.
Źródło:: Economic and Regional Studies; 2023, 16, 1; 34-50
2083-3725
2451-182X
Pojawia się w:: Economic and Regional Studies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Hybrid multiple imputation in a large scale complex survey
Autorzy:: Razzak, Humera
Heumann, Christian
Powiązania:: https://bibliotekanauki.pl/articles/1186925.pdf
Data publikacji:: 2019-12-10
Wydawca:: Główny Urząd Statystyczny
Tematy:: complex surveys
high-dimensional data
missing data
multiple imputation
Opis:: Large-scale complex surveys typically contain a large number of variables measured on an even larger number of respondents. Missing data is a common problem in such surveys. Since usually most of the variables in a survey are categorical, multiple imputation requires robust methods for modelling highdimensional categorical data distributions. This paper introduces the 3-stage Hybrid Multiple Imputation (HMI) approach, computationally efficient and easy to implement, to impute complex survey data sets that contain both continuous and categorical variables. The proposed HMI approach involves the application of sequential regression MI techniques to impute the continuous variables by using information from the categorical variables, already imputed by a non-parametric Bayesian MI approach. The proposed approach seems to be a good alternative to the existing approaches, frequently yielding lower root mean square errors, empirical standard errors and standard errors than the others. The HMI method has proven to be markedly superior to the existing MI methods in terms of computational efficiency. The authors illustrate repeated sampling properties of the hybrid approach using simulated data. The results are also illustrated by child data from the multiple indicator survey (MICS) in Punjab 2014.
Źródło:: Statistics in Transition new series; 2019, 20, 4; 33-58
1234-7655
Pojawia się w:: Statistics in Transition new series
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Application of descriptive models to forecasting seasonal time series with gaps
Autorzy:: Oesterreich, Maciej
Powiązania:: https://bibliotekanauki.pl/articles/424807.pdf
Data publikacji:: 2015
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: forecasting of missing data
descriptive models
systematic gap
Opis:: In this paper were presented the results of the application of quasi-simulation methods to analysis the impact of the occurrence of systematic gaps on the accuracy of inter and extrapolative forecasts for time series with seasonal fluctuations. Forecasts were built on the basis of predictors based on descriptive models with seasonally changing parameters. Theoretical considerations will be illustrated by the empirical example. The models estimation and construction of inter- and extrapolative forecasts were done with R and Statistica 10.
Źródło:: Econometrics. Ekonometria. Advances in Applied Data Analytics; 2015, 1 (47); 68-77
1507-3866
Pojawia się w:: Econometrics. Ekonometria. Advances in Applied Data Analytics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Computational intensive methods for prediction and imputation in time series analysis
Autorzy:: Neves, Maria
Cordeiro, Clara
Powiązania:: https://bibliotekanauki.pl/articles/729950.pdf
Data publikacji:: 2011
Wydawca:: Uniwersytet Zielonogórski. Wydział Matematyki, Informatyki i Ekonometrii
Tematy:: bootstrap
forecast intervals
missing data
time series analysis
Opis:: One of the main goals in times series analysis is to forecast future values. Many forecasting methods have been developed and the most successful are based on the concept of exponential smoothing, based on the principle of obtaining forecasts as weighted combinations of past observations. Classical procedures to obtain forecast intervals assume a known distribution for the error process, what is not true in many situations. A bootstrap methodology can be used to compute distribution free forecast intervals. First an adequately chosen model is fitted to the data series. Afterwards, and inspired on sieve bootstrap, an AR(p) is used to filter the series of the random component, under the stationarity hypothesis. The centered residuals are then resampled and the initial series is reconstructed. This methodology will be used to obtain forecasting intervals and for treating missing data, which often appear in a real time series. An automatic procedure was developed in R language and will be applied in simulation studies as well as in real examples.
Źródło:: Discussiones Mathematicae Probability and Statistics; 2011, 31, 1-2; 121-139
1509-9423
Pojawia się w:: Discussiones Mathematicae Probability and Statistics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Review of methods for data sets with missing values and practical applications
Autorzy:: Korczyński, Adam
Powiązania:: https://bibliotekanauki.pl/articles/433946.pdf
Data publikacji:: 2014
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: missing data pattern
missing data mechanism
complete-case analysis
available-case analysis
single imputation
likelihood-based methods
multiple imputation
weighting methods
Opis:: The aim of this paper is to revise the traditional methods (complete-case analysis, available-case analysis, single imputation) and current methods (likelihood-based methods, multiple imputation, weighting methods) for handling the problem of missing data and to assess their usefulness in statistical research. The paper provides the terminology and the description of traditional and current methods and algorithms used in the analysis of incomplete data sets. The methods are assessed in terms of the statistical properties of their estimators. An example is provided for the multiple imputation method. The review indicates that current methods outweigh traditional ones in terms of bias reduction, precision and efficiency of the estimation.
Źródło:: Śląski Przegląd Statystyczny; 2014, 12(18); 83-104
1644-6739
Pojawia się w:: Śląski Przegląd Statystyczny
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: The handling of missing binary data in language research
Autorzy:: Pichette, Francois
Beland, Sebastien
Jolani, Shahab
Leśniewska, Justyna
Powiązania:: https://bibliotekanauki.pl/articles/780779.pdf
Data publikacji:: 2015-03-01
Wydawca:: Uniwersytet im. Adama Mickiewicza w Poznaniu
Tematy:: missing data
Cronbach's alpha
participant exclusion
second language testing
Opis:: Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Gra- ham, 2002) that data replacement and deletion methods are common in research. Language researchers declared that when faced with missing answers of the yes/no type (that translate into zero or one in data tables), the three most common solutions they adopt are to exclude the participant’s data from the analyses, to leave the square empty, or to fill in with zero, as for an incorrect answer. This study then examines the impact on Cronbach’s α of five types of data insertion, using simulated and actual data with various numbers of participants and missing percentages. Our analyses indicate that the three most common methods we identified among language researchers are the ones with the greatest impact n Cronbach's α coefficients; in other words, they are the least desirable solutions to the missing data problem. On the basis of our results, we make recommendations for language researchers concerning the best way to deal with missing data. Given that none of the most common simple methods works properly, we suggest that the missing data be replaced either by the item’s mean or by the participants’ overall mean to provide a better, more accurate image of the instrument’s internal consistency.
Źródło:: Studies in Second Language Learning and Teaching; 2015, 5, 1; 153-169
2083-5205
2084-1965
Pojawia się w:: Studies in Second Language Learning and Teaching
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Symulacyjna analiza wpływu liczby i rozmieszczenia luk niesystematycznych na dokładność prognoz
Simulation analysis of influence of number and distribution of unsystematic gaps on the accuracy of forecasts
Autorzy:: Oesterreich, Maciej
Powiązania:: https://bibliotekanauki.pl/articles/425016.pdf
Data publikacji:: 2015
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: forecasting of missing data
simulation methods
analysis of gaps distribution
Opis:: In this paper there was conducted a statistical analysis of the impact of the distribution of unsystematic gaps on the accouracy of inter- and extrapolative forecasts in the seasonal time series. In the analysis, as variable, there was used the average period of stay of tourists in accommodation establishments in the West Pomeranian Voivodeship in the years 2008-2013. In calculations there were used simulation methods to generate ten thousand sets of gaps for the three variants, differed in the number of gaps. For all the set and variants of gaps, there were estimated time series models with exponential trend and relatively-fixed seasonality. In the next step there were built inter- and extrapolative forecasts and calculated their relative errors (MAPE). In the analysis there were used R program and Statistica 10.
Źródło:: Econometrics. Ekonometria. Advances in Applied Data Analytics; 2015, 2 (48); 78-88
1507-3866
Pojawia się w:: Econometrics. Ekonometria. Advances in Applied Data Analytics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Predicting in multivariate incomplete time series. Application of the expectation-maximisation algorithm supplemented by the Newton-Raphson method
Autorzy:: Korczyński, Adam
Powiązania:: https://bibliotekanauki.pl/articles/1806793.pdf
Data publikacji:: 2021-08-24
Wydawca:: Główny Urząd Statystyczny
Tematy:: missing data
multivariate time series
expectation-maximisation algorithm
Newton-Raphson algorithm
Opis:: Statistical practice requires various imperfections resulting from the nature of data to be addressed. Data containing different types of measurement errors and irregularities, such as missing observations, have to be modelled. The study presented in the paper concerns the application of the expectation-maximisation (EM) algorithm to calculate maximum likelihood estimates, using an autoregressive model as an example. The model allows describing a process observed only through measurements with certain level of precision and through more than one data series. The studied series are affected by a measurement error and interrupted in some time periods, which causes the information for parameters estimation and later for prediction to be less precise. The presented technique aims to compensate for missing data in time series. The missing data appear in the form of breaks in the source of the signal. The adjustment has been performed by the EM algorithm to a hybrid version, supplemented by the Newton-Raphson method. This technique allows the estimation of more complex models. The formulation of the substantive model of an autoregressive process affected by noise is outlined, as well as the adjustment introduced to overcome the issue of missing data. The extended version of the algorithm has been verified using sampled data from a model serving as an example for the examined process. The verification demonstrated that the joint EM and Newton-Raphson algorithms converged with a relatively small number of iterations and resulted in the restoration of the information lost due to missing data, providing more accurate predictions than the original algorithm. The study also features an example of the application of the supplemented algorithm to some empirical data (in the calculation of a forecasted demand for newspapers).
Źródło:: Przegląd Statystyczny; 2021, 68, 1; 17-46
0033-2372
Pojawia się w:: Przegląd Statystyczny
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: MODELE HARMONICZNE ZE ZŁOŻONĄ SEZONOWOŚCIĄ W PROGNOZOWANIU SZEREGÓW CZASOWYCH Z LUKAMI SYSTEMATYCZNYMI
HARMONICAL MODELS WITH COMPLEX SEASONALITY IN FORECASTING TIME SERIES WITH SYSTEMATIC GAPS
Autorzy:: Szmuksta-Zawadzka, Maria
Zawadzki, Jan
Powiązania:: https://bibliotekanauki.pl/articles/453180.pdf
Data publikacji:: 2013
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Katedra Ekonometrii i Statystyki
Tematy:: modele harmoniczne
sezonowość złożona
brakujące dane
harmonic models
complex seasonality
missing data
Opis:: W modelowaniu zmiennych ze złożoną sezonowością dla pełnych danych i danych z lukami niesystematycznymi mogą być wykorzystywane zarówno modele ze zmiennymi zero-jedynkowymi jak i modele harmoniczne. Natomiast w przypadku występowania luk systematycznych- jedynie oszczędne modele harmoniczne. W modelach tych każdy rodzaj wahań opisywany jest za pomocą odrębnych zestawów składowych sinuso- i kosinusoidalnych. Rozważania teoretyczne zostaną zilustrowane przykładem empirycznym.
In the modeling of the variables with complex seasonality for complete time series and with unsystematic data gaps can be used both types of models: with dummy variables and harmonic models. However, in modeling variable with systematic gaps can be used only harmonic models. In these models, each type of fluctuation is described by separate sets of sine- and cosine component. Theoretical considerations are illustrated by an empirical example.
Źródło:: Metody Ilościowe w Badaniach Ekonomicznych; 2013, 14, 3; 81-90
2082-792X
Pojawia się w:: Metody Ilościowe w Badaniach Ekonomicznych
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Tests for profile analysis based on two-step monotone missing data
Autorzy:: Onozawa, Mizuki
Takahashi, Sho
Seo, Takashi
Powiązania:: https://bibliotekanauki.pl/articles/729840.pdf
Data publikacji:: 2013
Wydawca:: Uniwersytet Zielonogórski. Wydział Matematyki, Informatyki i Ekonometrii
Tematy:: Hotelling's T²-type statistic
likelihood ratio
profile analysis
two-step monotone missing data
Opis:: In this paper, we consider profile analysis for the observations with two-step monotone missing data. There exist three interesting hypotheses - the parallelism hypothesis, level hypothesis, and flatness hypothesis - when comparing the profiles of some groups. The T²-type statistics and their asymptotic null distributions for the three hypotheses are given for two-sample profile analysis. We propose the approximate upper percentiles of these test statistics. When the data do not have missing observations, the test statistics perform lower than the usual test statistics, for example, as in [8]. Further, we consider a parallel profile model for several groups when the data have two-step monotone missing observations. Under the assumption of non-missing data, the likelihood ratio test procedure is derived by [16]. We derive the test statistic based on the likelihood ratio. Finally, in order to investigate the accuracy for the null distributions of the proposed statistics, we perform a Monte Carlo simulation for some selected parameters values.
Źródło:: Discussiones Mathematicae Probability and Statistics; 2013, 33, 1-2; 171-190
1509-9423
Pojawia się w:: Discussiones Mathematicae Probability and Statistics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: The occurrence of missing data in surveys
Występowanie braków danych w badaniach ankietowych
Autorzy:: Zdebska, W.,
Powiązania:: https://bibliotekanauki.pl/articles/2116965.pdf
Data publikacji:: 2021
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie
Tematy:: survey
analysis of missing data
MCAR
MAR
NMAR
badania ankietowe
analiza braków danych
Opis:: The purpose of this article is to discuss issues related to the analysis of missing data. Why do missing data occur in a data set? What percentage of the collected data constitutes missing data? What is the nature of missing data that emerges during data collection? The above questions are extremely important in assessing conducted surveys or in evaluating the quality of the collected data. A lack of reflection on the aspects mentioned above may lead to false conclusions and recommendations. This article presents not only an overview of the literature regarding missing data, but also shows how in a practical way an analysis of the randomness of missing data can be performed. The analysis presented in the article is based on data collected as part of the Polish General Social Survey carried out in 2008. The main recommendation of the author is to conduct an analysis of the randomness of missing data before analyzing the collected data.
Celem niniejszego artykułu jest omówienie zagadnień związanych z analizą braków danych. Dlaczego braki danych występują w zbiorze danych? Ile procent zebranych danych stanowią braki danych? Jaka jest natura braków danych, które pojawiły się w trakcie zbierania danych? Jakie czynniki mogą wpłynąć na potencjalne pojawienie się braków danych? Powyższe pytania, na które autorka artykułu pragnie odpowiedzieć w jego ramach są niezwykle istotne w ocenie prowadzonych badań ankietowych lub ocenie jakości danych zastanych. Brak refleksji nad wspomnianymi powyżej aspektami może prowadzić natomiast do wyciągania fałszywych wniosków oraz rekomendacji. Stąd też analiza braków danych jest niezwykle istotnym, a często nadal pomijanym etapem analizy danych ankietowych. W ramach artykułu przestawiona została analiza losowości braków danych na podstawie danych zebranych w ramach Polskiego Generalnego Sondażu Społecznego w 2008 roku.
Źródło:: Acta Scientiarum Polonorum. Oeconomia; 2021, 20, 2; 95-103
1644-0757
Pojawia się w:: Acta Scientiarum Polonorum. Oeconomia
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Empirical Evaluation of Methods of Filling the Missing Data in Learning Probabilistic Models
Porównanie metod uzupełniania danych brakujących w uczeniu modeli probabilistycznych
Autorzy:: Falkowski, A. A.
Łupińska-Dubicka, A.
Powiązania:: https://bibliotekanauki.pl/articles/88374.pdf
Data publikacji:: 2018
Wydawca:: Politechnika Białostocka. Oficyna Wydawnicza Politechniki Białostockiej
Tematy:: dane brakujące
model probabilistyczny
sieci Bayesa
klasyfikacja
missing data
probabilistic models
Bayesian networks
classification
Opis:: Missing data is a common problem in statistical analysis and most practical databases contain missing values of some of their attributes. Missing data can appear for many reasons. However, regardless of the reason for the missing values, even a small percent of missing data can cause serious problems with analysis reducing the statistical power of a study and leading to draw wrong conclusions. In this paper the results of handling missing observations in learning probabilistic models were presented. Two data sets taken from UCI Machine Learning Repository were used to learn the quantitative part of the Bayesian networks. To provide the opportunity to compare selected data sets did not contain any missing values. For each model data sets with variety of levels of missing values were artificially generated. The main goal of this paper was to examine whether omitting observations has an influence on model’s reliability. The accuracy was defined as the percentage of correctly classified records and has been compared to the results obtained in the data set not containing missing values.
Brakujące dane są częstym problemem w analizie statystycznej, a większość baz danych zawiera brakujące wartości niektórych z ich atrybutów. Brakujące dane mogą pojawiać się z wielu powodów. Jednak bez względu na przyczynę brakujących wartości nawet ich niewielki procent może spowodować poważne problemy z analizą, zmniejszając siłę statystyczną badania i prowadząc do wyciągnięcia błędnych wniosków. W artykule przedstawiono wyniki uzupełniania danych brakujących w uczeniu modeli probabilistycznych. Dwa zestawy danych pobrane z repozytorium uczenia maszynowego UCI posłużyły do wytrenowania ilościowej części sieci bayesowskich. Aby zapewnić możliwość porównania wybrane zbiory danych nie zawierały żadnych brakujących wartości. Dla każdego modelu zbiory danych z różnymi poziomami brakujących wartości zostały sztucznie wygenerowane. Głównym celem tego artykułu było zbadanie, czy braki w obserwacjach mają wpływ na niezawodność modelu. Dokładność została zdefiniowana jako procent poprawnie zaklasyfikowanych rekordów i została porównana z wynikami uzyskanymi w zbiorze danych niezawierającym brakujących wartości.
Źródło:: Advances in Computer Science Research; 2018, 14; 55-67
2300-715X
Pojawia się w:: Advances in Computer Science Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Energy associated tuning method for short-term series forecasting by complete and incomplete datasets
Autorzy:: Rodríguez-Rivero, C.
Pucheta, J.
Laboret, S.
Sauchelli, V.
Patińo, D.
Powiązania:: https://bibliotekanauki.pl/articles/91842.pdf
Data publikacji:: 2017
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: short time series
forecasting
missing data
energy associated to series
complete datasets
incomplete datasets
Opis:: This article presents short-term predictions using neural networks tuned by energy associated to series based-predictor filter for complete and incomplete datasets. A benchmark of high roughness time series from Mackay Glass (MG), Logistic (LOG), Henon (HEN) and some univariate series chosen from NN3 Forecasting Competition are used. An average smoothing technique is assumed to complete the data missing in the dataset. The Hurst parameter estimated through wavelets is used to estimate the roughness of the real and forecasted series. The validation and horizon of the time series is presented by the 15 values ahead. The performance of the proposed filter shows that even a short dataset is incomplete, besides a linear smoothing technique employed; the prediction is almost fair by means of SMAPE index. Although the major result shows that the predictor system based on energy associated to series has an optimal performance from several chaotic time series, in particular, this method among other provides a good estimation when the short-term series are taken from one point observations.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2017, 7, 1; 5-16
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Z badań nad metodami prognozowania na podstawie niekompletnych szeregów czasowych z wahaniami okresowymi (sezonowymi)
Studies of methods applied to forecasting incomplete data in seasonal time series
Autorzy:: Szmuksta-Zawadzka, Maria
Zawadzki, Jan
Powiązania:: https://bibliotekanauki.pl/articles/422819.pdf
Data publikacji:: 2012
Wydawca:: Główny Urząd Statystyczny
Tematy:: szeregi czasowe
wahania sezonowe
brakujące dane
prognozowanie
time series
seasonal fluctuations
missing data
forecasting
Opis:: Praca została poświęcona syntetycznemu omówieniu wyników wieloletnich badań autorów nad zastosowaniami metod prognozowania w warunkach braku pełnej informacji w szeregach czasowych z wahaniami sezonowymi. Rozważania odnosić się będą do dwóch rodzajów luk w danych: systematycznych i niesystematycznych. Z lukami systematycznymi mamy do czynienia wtedy, gdy nie są dostępne informacje liczbowe przynajmniej o jednym podokresie w całym przedziale czasowym „próby”. Rozpatrywane będą metody prognozowania zarówno dla danych oryginalnych (z sezonowością) jak i danych, z których wyeliminowano wahania sezonowe. Egzemplifikacją rozważań o charakterze teoretycznym będzie przykład empiryczny.
This work presents discussion about results of long-term of authors research on applications of different forecasting methods in condition of lack of full information. There will be considered two types of gaps in data: systematic and unsystematic. The systematic gaps in data are only when we have not any information about at least one sub-period in the whole of analyzed data. There will be presented two types of methods applied to time series with and without seasonal component. Exemplification of theoretical considerations will be an empirical example.
Źródło:: Przegląd Statystyczny; 2012, 59, numer specjalny 1; 140-154
0033-2372
Pojawia się w:: Przegląd Statystyczny
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Influence of missing data imputation method on the classification accuracy of the medical data
Autorzy:: Orczyk, T.
Porwik, P.
Powiązania:: https://bibliotekanauki.pl/articles/334037.pdf
Data publikacji:: 2013
Wydawca:: Uniwersytet Śląski. Wydział Informatyki i Nauki o Materiałach. Instytut Informatyki. Zakład Systemów Komputerowych
Tematy:: medical data analysis
missing data
data imputation
classification efficiency
analiza danych medycznych
brakujące dane
przypisanie danych
efektywność klasyfikacji
Opis:: Aim of this study is to show the dangers of filling missing data - particularly medical data. Because there are many dedicated medical expert systems and medical decision support systems, a special attention must be paid on the construction of classifiers. Medical data are almost never complete, and completion of the missing data requires a special care. The safest approach of dealing with missing data would be removing records with missing parameters and/or removing parameters that are missing in the records. Unfortunately reducing data set that is already very small is not always an option. Dangers coming out from data imputation are shown in the article, which presents the influence of selected missing data filling algorithms on the classification accuracy.
Źródło:: Journal of Medical Informatics & Technologies; 2013, 22; 111-116
1642-6037
Pojawia się w:: Journal of Medical Informatics & Technologies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 16.

Tytuł:: The problem of imputation of the missing data from the continuous counts of road traffic
Autorzy:: Spławińska, M.
Powiązania:: https://bibliotekanauki.pl/articles/231354.pdf
Data publikacji:: 2015
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: ruch drogowy
zbiór danych
przypisanie
dane brakujące
model SARIMA
road traffic
data collection
imputation
missing data
SARIMA model
Opis:: Missing traffic data is an important issue for road administration. Although numerous ways can be found to impute them in foreign literature (inter alia, the most effective method, that is Box-Jenkins models), in Poland, still only proven and simplified methods are applied. The article presents the analyses including an assessment of the completeness of the existing traffic data and works related to the construction of SARIMA model. The study was conducted on the basis of hourly traffic volumes, derived from the continuous traffic counts stations located in the national road network in Poland (Golden River stations) from the years 2005 – 2010. As a result, the proposed model was used to impute the missing data in the form of SARIMA (1.1,1)(0,1,1)168. The newly developed model can be used effectively to fill in the missing required days of measurement for estimating AADT by AASHTO method. In other cases, due to its accuracy and laboriousness of the process, it is not recommended.
Źródło:: Archives of Civil Engineering; 2015, 61, 1; 131-145
1230-2945
Pojawia się w:: Archives of Civil Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 17.

Tytuł:: Drzewa klasyfikacyjne w medycynie
Classification trees in medicine
Autorzy:: Owczarek, Aleksander J.
Powiązania:: https://bibliotekanauki.pl/articles/1035042.pdf
Data publikacji:: 2014
Wydawca:: Śląski Uniwersytet Medyczny w Katowicach
Tematy:: drzewa klasyfikacyjne
proces decyzyjny
współliniowość zmiennych
dane
niepełne
classification trees
decision process
multicollinearity
missing data
Opis:: The paper presents the use of computerized diagnostic decision support systems for medical diagnostics in medicine. The structure of a classical decision tree and the advantages and disadvantages of using classification trees have been discussed. Moreover, the paper deals with the effect of classification trees with respect to other classic statistical methods, such as discriminant analysis and logistic regression, taking into account the problem of variable multicollinearity and the problem of the occurrence of so-called missing data. Additionally, some examples of the application of classification trees in medicine have been shown.
W pracy zaprezentowano wykorzystanie w medycynie komputerowych systemów diagnostyki medycznej. Przedstawiono budowę klasycznego drzewa decyzyjnego oraz zalety i wady stosowania drzew klasyfikacyjnych. Ponadto omówiono działanie drzew klasyfikacyjnych w świetle innych klasycznych metod statystycznych, takich jak analiza dyskryminacyjna czy regresja logistyczna, z uwzględnieniem problemu współliniowości zmiennych czy problemu występowania tzw. danych niepełnych. Podano wybrane przykłady zastosowania drzew klasyfikacyjnych w medycynie.
Źródło:: Annales Academiae Medicae Silesiensis; 2014, 68, 6; 449-456
1734-025X
Pojawia się w:: Annales Academiae Medicae Silesiensis
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 18.

Tytuł:: Rekonstrukcja brakujących danych temperatury gruntu w Polskiej Stacji Polarnej w Hornsundzie (SW Spitsbergen) w latach 1990-2013
Reconstruction of the missing data of the ground temperature in the Polish Polar Station in Hornsund (SW Spitsbergen) in the period of 1990-2013
Autorzy:: Leszkiewicz, J.
Caputa, Z.
Powiązania:: https://bibliotekanauki.pl/articles/260750.pdf
Data publikacji:: 2015
Wydawca:: Stowarzyszenie Klimatologów Polskich
Tematy:: temperatura gruntu
Spitsbergen
metoda statystyczna
rekonstrukcja brakujących danych
ground temperature
statistical method
reconstruction of missing data
Opis:: Temperatura gruntu jest ważnym wskaźnikiem stanu wieloletniej zmarzliny oraz warstwy czynnej szczególnie w okresie współczesnego ocieplenia klimatu. Oddziałuje na zjawiska geomorfologiczne, hydrologiczne i inne, które zachodzą głównie w warstwie czynnej, natomiast całkowite zamarznięcie gruntu wyraźnie hamuje ich przebieg. Stąd też duże zainteresowanie danymi temperatury gruntu. Jednak historyczne dane często cechują się brakami pomiarowymi lub krótkimi seriami a nawet błędami. Dlatego dająca pozytywne wyniki, metoda rekonstrukcji danych temperatury gruntu na różnych głębokościach może ułatwić badania nad termiką gruntu. Metoda warunków meteorologicznych poprzedzających (MWMP) pozwala z wysoką wiarygodnością statystyczną odtworzyć brakujące serie danych na podstawie temperatury powietrza lub innych. Użyteczność metody przedstawiono na podstawie brakujących pomiarów temperatury gruntu na Polskiej Stacji Polarnej. Stwierdzono wysoką korelację (r>0,9) oraz istotność statystyczną dla relacji temperatura powietrza poprzedzająca – temperatura gruntu. Długość czasu reakcji (połowa czasu poprzedzającego) wyniosła: 1-4 dni dla przypowierzchniowej temperatury gruntu (głębokości 5, 10 i 20 cm) oraz 8-26,5 dni dla temperatury gruntu z głębokości 100 cm. Analiza długich serii czasowych pozwoliła na określenie tendencji współczesnego ocieplenia gruntu, np. zanik temperatury gruntu -10°C na głębokości 100 cm od roku 2005.
The ground temperature is an important indicator of the state of permafrost and the active layer, especially during the contemporary warming. It affects geomorphological, hydrological and other phenomena, which occur mainly in the active layer, whereas the total freezing of the ground effectively inhibits their course. Hence the great interest in the ground temperature data. However, the historical data is often characterized by the lack of measurements or short series, or even errors. Therefore, adopting an effective method for the reconstruction of the data of the ground temperature at different depths can facilitate research on the ground temperature. The method of preceding weather conditions allows reconstruction of the missing statistical data series based on the air temperature or other factors with great efficiency. The effectiveness of the method is illustrated by the example of the missing ground temperature measurements at the Polish Polar Station. A high correlation (r >0.9) and statistical significance of the relationship between the preceding air temperature and the temperature of the ground. The length of the response time (half of the preceding time) was: 1-4 days for the subsurface ground temperature (a depth of 5, 10 and 20cm) and 8-26.5 days for the ground temperature at a depth of 100cm. The analysis of long time series allowed detecting the trends of the modern warming of the ground, for example the disappearance of the ground temperature of -10°C at a depth of 100cm since 2005.
Źródło:: Problemy Klimatologii Polarnej; 2015, 25; 201-210
1234-0715
Pojawia się w:: Problemy Klimatologii Polarnej
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 19.

Tytuł:: Bias Reduction of Finite Population Imputation by Kernel Methods
Autorzy:: Pettersson, Nicklas
Powiązania:: https://bibliotekanauki.pl/articles/465881.pdf
Data publikacji:: 2013
Wydawca:: Główny Urząd Statystyczny
Tematy:: bayesian bootstrap
boundary and nonresponse bias missing data
multiple imputation
Pólya urn models
real donor imputation
Opis:: Missing data is a nuisance in statistics. Real donor imputation can be used with item nonresponse. A pool of donor units with similar values on auxiliary variables is matched to each unit with missing values. The missing value is then replaced by a copy of the corresponding observed value from a randomly drawn donor. Such methods can to some extent protect against nonresponse bias. But bias also depends on the estimator and the nature of the data. We adopt techniques from kernel estimation to combat this bias. Motivated by Pólya urn sampling, we sequentially update the set of potential donors with units already imputed, and use multiple imputations via Bayesian bootstrap to account for imputation uncertainty. Simulations with a single auxiliary variable show that our imputation method performs almost as well as competing methods with linear data, but better when data is nonlinear, especially with large samples.
Źródło:: Statistics in Transition new series; 2013, 14, 1; 139-160
1234-7655
Pojawia się w:: Statistics in Transition new series
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 20.

Tytuł:: On classification with missing data using rough-neuro-fuzzy systems
Autorzy:: Nowicki, R. K.
Powiązania:: https://bibliotekanauki.pl/articles/907774.pdf
Data publikacji:: 2010
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: zbiór rozmyty
struktura neuronowo-rozmyta
klasyfikacja
brakujące dane
fuzzy sets
neuro-fuzzy architectures
classification
missing data
Opis:: The paper presents a new approach to fuzzy classification in the case of missing data. Rough-fuzzy sets are incorporated into logical type neuro-fuzzy structures and a rough-neuro-fuzzy classifier is derived. Theorems which allow determining the structure of the rough-neuro-fuzzy classifier are given. Several experiments illustrating the performance of the roughneuro-fuzzy classifier working in the case of missing features are described.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2010, 20, 1; 55-67
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 21.

Tytuł:: Missing data estimation based on the chaining technique in survey sampling
Autorzy:: Singh Thakur, Narendra
Shukla, Diwakar
Powiązania:: https://bibliotekanauki.pl/articles/2156986.pdf
Data publikacji:: 2022-12-15
Wydawca:: Główny Urząd Statystyczny
Tematy:: estimation
missing data
chaining
imputation
bias
mean squared error (MSE)
factor type (F-T)
chain type estimator
double sampling
Opis:: Sample surveys are often affected by missing observations and non-response caused by the respondents' refusal or unwillingness to provide the requested information or due to their memory failure. In order to substitute the missing data, a procedure called imputation is applied, which uses the available data as a tool for the replacement of the missing values. Two auxiliary variables create a chain which is used to substitute the missing part of the sample. The aim of the paper is to present the application of the Chain-type factor estimator as a means of source imputation for the non-response units in an incomplete sample. The proposed strategies were found to be more efficient and bias-controllable than similar estimation procedures described in the relevant literature. These techniques could also be made nearly unbiased in relation to other selected parametric values. The findings are supported by a numerical study involving the use of a dataset, proving that the proposed techniques outperform other similar ones.
Źródło:: Statistics in Transition new series; 2022, 23, 4; 91-111
1234-7655
Pojawia się w:: Statistics in Transition new series
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 22.

Tytuł:: Klasyfikacja luk pomiarowych w danych rejestrowanych na stacjach monitoringu powietrza
Classification of air monitoring data gaps
Autorzy:: Hoffman, S.
Jasiński, R.
Powiązania:: https://bibliotekanauki.pl/articles/297005.pdf
Data publikacji:: 2009
Wydawca:: Politechnika Częstochowska. Wydawnictwo Politechniki Częstochowskiej
Tematy:: zanieczyszczenia powietrza
monitoring automatyczny
dane
stężenia chwilowe
brakujące dane
luki pomiarowe
klasyfikacja
air monitoring
hourly concentrations
monitoring data
air pollution
missing data
measure gaps
classification
Opis:: Rejestrowane na stacjach monitoringu powietrza zbiory danych nigdy nie są kompletne. W skali roku liczba odnotowywanych braków jest zmienna. Ocena jakości powietrza na podstawie niepełnych pomiarów jest utrudniona. Obowiązujące przepisy prawne dopuszczają możliwość wykorzystania modelowania w celu uzupełnienia brakujących danych. Rozpoznanie typowych struktur obszarów z brakującymi danymi umożliwia ich klasyfikację, a następnie rekomendację odpowiednich metod modelowania dla wyszczególnionych klas. Celem badań było wytypowanie charakterystycznych struktur luk pomiarowych w zbiorach danych i określenie częstości ich występowania. Klasyfikację przypadków z brakującymi danymi zaproponowano na podstawie przeglądu wieloletnich danych, pochodzących z kilku różnych stacji pomiarowych automatycznego monitoringu powietrza. Analizowano serie czasowe chwilowych stężeń podstawowych zanieczyszczeń powietrza (O3, NO2, NO, PM10, SO2, CO), zarejestrowanych w latach 2004-2008 na stacjach monitoringu powietrza Warszawa-Ursynów, Radom, Łódź-Widzew, Piotrków Trybunalski. Na podstawie wyników przeprowadzonej analizy można stwierdzić, że brakujące dane występują powszechnie w zbiorach danych pochodzących z monitoringu powietrza. Częstość ich występowania w rocznych seriach pomiarowych może wynosić od kilku do nawet kilkudziesięciu procent. Większość luk pomiarowych jest krótka - stanowią je głównie pojedyncze przypadki. Zdecydowanie rzadziej występują bloki brakujących danych, przekraczające 3-4 przypadki (dłuższe od 3-4 godzin). Największą częstość występowania przypadków z niezarejestrowanymi wynikami odnotowano dla luk najdłuższych, obejmujących więcej niż 24 przypadki (>24 godziny).
The data gathered continuously in the air monitoring systems are never entire. In the whole year, the number of missing records is changeable. The deficiency of data could result in uncertainty of a statistical assessment, required by the air quality standards, and cause the uselessness of monitoring measurements. Air quality standards permit to use modelling in order to recreate the missing data when the completeness of the monitoring set is not sufficient. Applied modelling methods should guarantee possibly the best precision to achieve the air quality assessment being closest to reality. Single, specified method does not assure the maximal accuracy because the missing data in data matrix may create gaps of various shapes and ranges. Recognition of typical structures of missing data fields should be the base of their classification. For the specified classes of gaps the optimum modelling methods may be recommended and assigned. The main objective of the analysis was to select typical patterns of gaps in air monitoring data matrixes, and the assessment of their appearing. The missing data classification was suggested after long-term data survey. The analyzed data sets derived from 4 different air monitoring sites in the Central Poland (Warsaw-Ursynów, Radom, Lodz-Widzew, Piotrków Trybunalski). The data were gathered in the period 2004-2008. The examined time-series involved hourly concentrations of main air pollutants: O3, NO2, NO, PM10, SO2, CO. The results allow coming to some general conclusions. Missing data commonly occur in sets of air monitoring records. Gaps may include up to several or even more per cent of all expected data in yearly measuring series. For all air pollutants, the most of the gaps in monitoring time series are very short. Single (1-hour) missing values dominate among gaps of different length. Gaps lengths exceeding 3-4 hours are observed occasionally. However, the greatest frequency of single without-data cases appearing is observed in the longest gaps (>24 hours), because of their lengths.
Źródło:: Inżynieria i Ochrona Środowiska; 2009, 12, 2; 101-117
1505-3695
2391-7253
Pojawia się w:: Inżynieria i Ochrona Środowiska
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 23.

Tytuł:: Censored random variable as a form of coping with missing data in studying the leachability of heavy metals from hardening slurries
Zmienna mieszana losowa jako forma radzenia sobie z brakami danych w badaniu wymywalności metali ciężkich z zawiesiny twardniejącej
Autorzy:: Szarek, Łukasz
Kledyński, Zbigniew
Powiązania:: https://bibliotekanauki.pl/articles/1852666.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: brak danych
zmienna losowa cenzurowana
rozkład cenzurowany
metal ciężki
wymywalność
zawiesina twardniejąca
missing data
censored random variable
censored distribution
heavy metal
leachability
hardening slurry
Opis:: Missing data in test result tables can significantly impact the analysis quality, especially in relation to technical sciences, where the mechanism generating missing data is often non-random, and their presence depends on the non-observed part of studied variables. In such cases, the application of an inappropriate method for dealing with missing data will lead to bias in the estimated distribution parameters. The article presents a relatively simple method to implement in dealing with missing data generated as a result of the MNAR mechanism, which utilizes the censored random variable. This procedure does not modify the variable distribution form, which is why it ensures objective and efficient estimation of distribution parameters within studies affected by certain restrictions of technical or physical nature (censored distribution), with a relatively low workload. Furthermore, it does not require the application of specialized software. A prerequisite for using this method is the knowledge of the frequency and cause of missing data. The method for estimating the random variable censored distribution parameters was shown based on the example of studying the leachability of selected heavy metals from a hardening slurry. The analysis results were compared with classical methods for dealing with missing data, such as, ignoring missing data observations (listwise or pairwise deletion), single imputation and stochastic regressive imputation.
Braki danych w tablicach wyników badań mogą w znaczący sposób wpływać na jakość analizy, szczególnie w naukach technicznych, gdzie mechanizm generujący braki danych często jest nielosowy, a ich występowanie zależy od części nieobserwowanej badanych zmiennych. W takich przypadkach zastosowanie nieodpowiedniej metody radzenia sobie z brakami danych prowadzi do obciążenia estymowanych parametrów rozkładu. W artykule przedstawiono stosunkowo prostą w implementacji metodę radzenia sobie z brakami danych powstałymi w wyniku mechanizmu MNAR wykorzystującą rozkład cenzurowany. Procedura ta nie modyfikuje postaci rozkładu zmiennej, przez co zapewnia obiektywne i skuteczne estymowanie parametrów rozkładu w badaniach dotkniętych pewnymi ograniczeniami natury technicznej lub fizycznej, przy stosunkowo niskim nakładzie pracy. Ponadto nie wymaga zastosowania specjalistycznego oprogramowania. Warunkiem koniecznym zastosowania metody jest znajomość częstości występowania braków danych oraz ich przyczyny. Sposób estymacji parametrów rozkładu cenzurowanego zmiennej losowej przedstawiono na przykładzie badania wymywalności wybranych metali ciężkich z zawiesiny twardniejącej. Wyniki analizy porównano z klasycznymi sposobami radzenia sobie z brakami danych: pominięciem obserwacji z brakami danych, imputacją oraz stochastyczną imputacją regresyjną.
Źródło:: Archives of Civil Engineering; 2021, 67, 1; 233-247
1230-2945
Pojawia się w:: Archives of Civil Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 24.

Tytuł:: Aproksymacja stężeń zanieczyszczeń powietrza za pomocą neuronowych modeli szeregów czasowych
Aproximation of air monitoring data gaps by means of time-series neural models
Autorzy:: Hoffman, S.
Powiązania:: https://bibliotekanauki.pl/articles/297640.pdf
Data publikacji:: 2009
Wydawca:: Politechnika Częstochowska. Wydawnictwo Politechniki Częstochowskiej
Tematy:: szereg czasowy
modele neuronowe
stężenia chwilowe
dane monitoringu
brakujące dane
luki pomiarowe
aproksymacja
time series
neural models
air pollution
air monitoring
hourly concentrations
monitoring data
missing data
measure gaps
approximation
Opis:: W pracy oceniono możliwości aproksymacji stężeń zanieczyszczeń mierzonych na stacjach monitoringu powietrza. Do predykcji stężeń wykorzystano neuronowe modele szeregów czasowych. Jakość modelowania testowano na rzeczywistych danych pochodzących ze stacji monitoringu powietrza Łódź-Widzew, zarejestrowanych w latach 2004-2008. Analizie poddano względnie kompletny zbiór danych, obejmujący stężenia 6 podstawowych zanieczyszczeń powietrza: O3, NO2, NO, PM10, SO2, CO. Celem badawczym było określenie i porównanie dokładności predykcji stężeń różnych zanieczyszczeń powietrza. Modelowanie przeprowadzono, stosując sztuczne sieci neuronowe. Trening sieci odbywał się przy użyciu liniowego algorytmu pseudoinwersji. Wyjściem modelu było stężenie wybranego zanieczyszczenia w określonym czasie. Wejściami były wartości stężeń zarejestrowane w godzinach wcześniejszych. Każdy model charakteryzowały dwie wielkości: horyzont prognozy i liczba wartości opóźnionych. W analizie określono dokładność predykcji stężeń wybranych zanieczyszczeń dla stałej liczby wartości opóźnionych równej 24 przy zmieniającym się horyzoncie prognozy od 1 do 240 godz. Jako kryterium jakości modelowania przyjęto wartość błędu aproksymacji.
An assessment of quality of air pollutants concentration modeling was the main research purpose. The examination was made by means of artificial neural networks, which were employed to create time-series models. The quality of approximation was tested on the actual set of air monitoring data, gathered over a 5-year period at the measure site in Lodz-Widzew (Central Poland). The examined time-series involved hourly concentrations of main air pollutants: O3, NO2, NO, PM10, SO2, CO. The research aim was the estimation and the comparison of prediction accuracy for different air pollutants. Time-series models were characterized by two parameters which might influence the prediction quality: lookahead and steps. For all models the constant number of steps equal 24 hours was assumed. The effect of changes of lookahead in the range 1÷ 240 hours was analyzed. It was stated that the decreasing of precision of time-series models with the increase of lookahead is observed. The drop of accuracy depends on pollutant. The furthest reasonable prognosis may be done for ozone concentration. Approximation accuracy shortens in the order: O3, CO, SO2, PM10, NO2, NO.
Źródło:: Inżynieria i Ochrona Środowiska; 2009, 12, 3; 231-239
1505-3695
2391-7253
Pojawia się w:: Inżynieria i Ochrona Środowiska
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 25.

Tytuł:: Classifiers accuracy improvement based on missing data imputation
Autorzy:: Jordanov, I.
Petrov, N.
Petrozziello, A.
Powiązania:: https://bibliotekanauki.pl/articles/91626.pdf
Data publikacji:: 2018
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: machine learning
missing data
model-based imputation
neural networks
random forests
support vector machine
radar signal classification
nauczanie maszynowe
brakujące dane
sieci neuronowe
maszyna wektorów nośnych
klasyfikacja sygnałów radarowych
Opis:: In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2018, 8, 1; 31-48
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "missing data" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język