Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "feature selection" wg kryterium: Temat


Wyświetlanie 1-14 z 14
Tytuł:
Rough Sets Methods in Feature Reduction and Classification
Autorzy:
Świniarski, R. W.
Powiązania:
https://bibliotekanauki.pl/articles/908366.pdf
Data publikacji:
2001
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
rozpoznawanie obrazów
redukcja danych
rough sets
feature selection
classification
Opis:
The paper presents an application of rough sets and statistical methods to feature reduction and pattern recognition. The presented description of rough sets theory emphasizes the role of rough sets reducts in feature selection and data reduction in pattern recognition. The overview of methods of feature selection emphasizes feature selection criteria, including rough set-based methods. The paper also contains a description of the algorithm for feature selection and reduction based on the rough sets method proposed jointly with Principal Component Analysis. Finally, the paper presents numerical results of face recognition experiments using the learning vector quantization neural network, with feature selection based on the proposed principal components analysis and rough sets methods.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2001, 11, 3; 565-582
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
On the predictive power of meta-features in OpenML
Autorzy:
Bilalli, B.
Abelló, A.
Aluja-Banet, T.
Powiązania:
https://bibliotekanauki.pl/articles/331086.pdf
Data publikacji:
2017
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
feature extraction
feature selection
meta learning
ekstrakcja danych
selekcja danych
uczenie maszynowe
Opis:
The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., nonexperienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 697-712
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Correlation-based feature selection strategy in classification problems
Autorzy:
Michalak, K.
Kwaśnicka, H.
Powiązania:
https://bibliotekanauki.pl/articles/908379.pdf
Data publikacji:
2006
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
selekcja cech
współzależność cech
klasyfikacja
feature selection
pairwise feature evaluation
feature correlation
pattern classification
Opis:
In classification problems, the issue of high dimensionality, of data is often considered important. To lower data dimensionality, feature selection methods are often employed. To select a set of features that will span a representation space that is as good as possible for the classification task, one must take into consideration possible interdependencies between the features. As a trade-off between the complexity of the selection process and the quality of the selected feature set, a pairwise selection strategy has been recently suggested. In this paper, a modified pairwise selection strategy is proposed. Our research suggests that computation time can be significantly lowered while maintaining the quality of the selected feature sets by using mixed univariate and bivariate feature evaluation based on the correlation between the features. This paper presents the comparison of the performance of our method with that of the unmodified pairwise selection strategy based on several well-known benchmark sets. Experimental results show that, in most cases, it is possible to lower computation time and that with high statistical significance the quality of the selected feature sets is not lower compared with those selected using the unmodified pairwise selection process.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2006, 16, 4; 503-511
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
On the order equivalence relation of binary association measures
Autorzy:
Paradowski, M.
Powiązania:
https://bibliotekanauki.pl/articles/330881.pdf
Data publikacji:
2015
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
association coefficient
result ranking
linear combination
zeroed variance determinant
feature selection
Opis:
Over a century of research has resulted in a set of more than a hundred binary association measures. Many of them share similar properties. An overview of binary association measures is presented, focused on their order equivalences. Association measures are grouped according to their relations. Transformations between these measures are shown, both formally and visually. A generalization coefficient is proposed, based on joint probability and marginal probabilities. Combining association measures is one of recent trends in computer science. Measures are combined in linear and nonlinear discrimination models, automated feature selection or construction. Knowledge about their relations is particularly important to avoid problems of meaningless results, zeroed generalized variances, the curse of dimensionality, or simply to save time.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2015, 25, 3; 645-657
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Selecting Differentially Expressed Genes for Colon Tumor Classification
Autorzy:
Fujarewicz, K.
Wiench, M.
Powiązania:
https://bibliotekanauki.pl/articles/908154.pdf
Data publikacji:
2003
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
medycyna
automatyka
colon tumor
gene expression data
microarrays
support vector machines
feature selection
classification
Opis:
DNA microarrays provide a new technique of measuring gene expression, which has attracted a lot of research interest in recent years. It was suggested that gene expression data from microarrays (biochips) can be employed in many biomedical areas, e.g., in cancer classification. Although several, new and existing, methods of classification were tested, a selection of proper (optimal) set of genes, the expressions of which can serve during classification, is still an open problem. Recently we have proposed a new recursive feature replacement (RFR) algorithm for choosing a suboptimal set of genes. The algorithm uses the support vector machines (SVM) technique. In this paper we use the RFR method for finding suboptimal gene subsets for tumor/normal colon tissue classification. The obtained results are compared with the results of applying other methods recently proposed in the literature. The comparison shows that the RFR method is able to find the smallest gene subset (only six genes) that gives no misclassifications in leave-one-out cross-validation for a tumor/normal colon data set. In this sense the RFR algorithm outperforms all other investigated methods.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2003, 13, 3; 327-335
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
FSPL: A meta-learning approach for a filter and embedded feature selection pipeline
Autorzy:
Lazebnik, Teddy
Rosenfeld, Avi
Powiązania:
https://bibliotekanauki.pl/articles/2201020.pdf
Data publikacji:
2023
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
feature selection pipeline
meta learning
no free lunch
autoML
genetic algorithm
wybór funkcji
metauczenie
algorytm genetyczny
Opis:
There are two main approaches to tackle the challenge of finding the best filter or embedded feature selection (FS) algorithm: searching for the one best FS algorithm and creating an ensemble of all available FS algorithms. However, in practice, these two processes usually occur as part of a larger machine learning pipeline and not separately. We posit that, due to the influence of the filter FS on the embedded FS, one should aim to optimize both of them as a single FS pipeline rather than separately. We propose a meta-learning approach that automatically finds the best filter and embedded FS pipeline for a given dataset called FSPL. We demonstrate the performance of FSPL on n = 90 datasets, obtaining 0.496 accuracy for the optimal FS pipeline, revealing an improvement of up to 5.98 percent in the model’s accuracy compared to the second-best meta-learning method.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2023, 33, 1; 103--115
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A contemporary multi-objective feature selection model for depression detection using a hybrid pBGSK optimization algorithm
Autorzy:
Kavi Priya, Santhosam
Pon Karthika, Kasirajan
Powiązania:
https://bibliotekanauki.pl/articles/2201021.pdf
Data publikacji:
2023
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
depression detection
text classification
dimensionality reduction
hybrid feature selection
wykrywanie depresji
klasyfikacja tekstu
redukcja wymiarowości
wybór funkcji
Opis:
Depression is one of the primary causes of global mental illnesses and an underlying reason for suicide. The user generated text content available in social media forums offers an opportunity to build automatic and reliable depression detection models. The core objective of this work is to select an optimal set of features that may help in classifying depressive contents posted on social media. To this end, a novel multi-objective feature selection technique (EFS-pBGSK) and machine learning algorithms are employed to train the proposed model. The novel feature selection technique incorporates a binary gaining-sharing knowledge-based optimization algorithm with population reduction (pBGSK) to obtain the optimized features from the original feature space. The extensive feature selector (EFS) is used to filter out the excessive features based on their ranking. Two text depression datasets collected from Twitter and Reddit forums are used for the evaluation of the proposed feature selection model. The experimentation is carried out using naive Bayes (NB) and support vector machine (SVM) classifiers for five different feature subset sizes (10, 50, 100, 300 and 500). The experimental outcome indicates that the proposed model can achieve superior performance scores. The top results are obtained using the SVM classifier for the SDD dataset with 0.962 accuracy, 0.929 F1 score, 0.0809 log-loss and 0.0717 mean absolute error (MAE). As a result, the optimal combination of features selected by the proposed hybrid model significantly improves the performance of the depression detection system.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2023, 33, 1; 117--131
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A weighted wrapper approach to feature selection
Autorzy:
Kusy, Maciej
Zajdel, Roman
Powiązania:
https://bibliotekanauki.pl/articles/2055180.pdf
Data publikacji:
2021
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
feature selection
wrapper approach
feature significance
weighted combined ranking
convolutional neural network
classification accuracy
selekcja cech
sieć neuronowa konwolucyjna
dokładność klasyfikacji
Opis:
This paper considers feature selection as a problem of an aggregation of three state-of-the-art filtration methods: Pearson’s linear correlation coefficient, the ReliefF algorithm and decision trees. A new wrapper method is proposed which, on the basis of a fusion of the above approaches and the performance of a classifier, is capable of creating a distinct, ordered subset of attributes that is optimal based on the criterion of the highest classification accuracy obtainable by a convolutional neural network. The introduced feature selection uses a weighted ranking criterion. In order to evaluate the effectiveness of the solution, the idea is compared with sequential feature selection methods that are widely known and used wrapper approaches. Additionally, to emphasize the need for dimensionality reduction, the results obtained on all attributes are shown. The verification of the outcomes is presented in the classification tasks of repository data sets that are characterized by a high dimensionality. The presented conclusions confirm that it is worth seeking new solutions that are able to provide a better classification result while reducing the number of input features.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2021, 31, 4; 685--696
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Data mining methods for gene selection on the basis of gene expression arrays
Autorzy:
Muszyński, M.
Osowski, S.
Powiązania:
https://bibliotekanauki.pl/articles/329803.pdf
Data publikacji:
2014
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
gene expression array
gene ranking
feature selection
clusterization measures
fusion
SVM classification
ekspresja genów
selekcja cech
klasyfikacja SVM
Opis:
The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2014, 24, 3; 657-668
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The feature selection problem in computer-assisted cytology
Autorzy:
Kowal, M.
Skobel, M.
Nowicki, N.
Powiązania:
https://bibliotekanauki.pl/articles/329941.pdf
Data publikacji:
2018
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
nuclei segmentation
feature selection
breast cancer
convolutional neural network
segmentacja jądra
selekcja cech
rak piersi
sieć neuronowa konwolucyjna
Opis:
Modern cancer diagnostics is based heavily on cytological examinations. Unfortunately, visual inspection of cytological preparations under the microscope is a tedious and time-consuming process. Moreover, intra- and inter-observer variations in cytological diagnosis are substantial. Cytological diagnostics can be facilitated and objectified by using automatic image analysis and machine learning methods. Computerized systems usually preprocess cytological images, segment and detect nuclei, extract and select features, and finally classify the sample. In spite of the fact that a lot of different computerized methods and systems have already been proposed for cytology, they are still not routinely used because there is a need for improvement in their accuracy. This contribution focuses on computerized breast cancer classification. The task at hand is to classify cellular samples coming from fine-needle biopsy as either benign or malignant. For this purpose, we compare 5 methods of nuclei segmentation and detection, 4 methods of feature selection and 4 methods of classification. Nuclei detection and segmentation methods are compared with respect to recall and the F1 score based on the Jaccard index. Feature selection and classification methods are compared with respect to classification accuracy. Nevertheless, the main contribution of our study is to determine which features of nuclei indicate reliably the type of cancer. We also check whether the quality of nuclei segmentation/detection significantly affects the accuracy of cancer classification. It is verified using the test set that the average accuracy of cancer classification is around 76%. Spearman’s correlation and chi-square test allow us to determine significantly better features than the feature forward selection method.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 759-770
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Joint feature selection and classification for positive unlabelled multi-label data using weighted penalized empirical risk minimization
Autorzy:
Teisseyre, Paweł
Powiązania:
https://bibliotekanauki.pl/articles/2142491.pdf
Data publikacji:
2022
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
positive data
unlabelled data
multilabel classification
feature selection
empirical risk minimization
dane pozytywne
dane nieoznaczone
klasyfikacja wieloetykietowa
selekcja cech
Opis:
We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2022, 32, 2; 311--322
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Optimization on the complementation procedure towards efficient implementation of the index generation function
Autorzy:
Borowik, G.
Powiązania:
https://bibliotekanauki.pl/articles/330597.pdf
Data publikacji:
2018
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
data reduction
feature selection
indiscernibility matrix
logic synthesis
index generation function
redukcja danych
selekcja cech
synteza logiczna
funkcja generowania indeksów
Opis:
In the era of big data, solutions are desired that would be capable of efficient data reduction. This paper presents a summary of research on an algorithm for complementation of a Boolean function which is fundamental for logic synthesis and data mining. Successively, the existing problems and their proposed solutions are examined, including the analysis of current implementations of the algorithm. Then, methods to speed up the computation process and efficient parallel implementation of the algorithm are shown; they include optimization of data representation, recursive decomposition, merging, and removal of redundant data. Besides the discussion of computational complexity, the paper compares the processing times of the proposed solution with those for the well-known analysis and data mining systems. Although the presented idea is focused on searching for all possible solutions, it can be restricted to finding just those of the smallest size. Both approaches are of great application potential, including proving mathematical theorems, logic synthesis, especially index generation functions, or data processing and mining such as feature selection, data discretization, rule generation, etc. The problem considered is NP-hard, and it is easy to point to examples that are not solvable within the expected amount of time. However, the solution allows the barrier of computations to be moved one step further. For example, the unique algorithm can calculate, as the only one at the moment, all minimal sets of features for few standard benchmarks. Unlike many existing methods, the algorithm additionally works with undetermined values. The result of this research is an easily extendable experimental software that is the fastest among the tested solutions and the data mining systems.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 803-815
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Data mining methods for prediction of air pollution
Autorzy:
Siwek, K.
Osowski, S.
Powiązania:
https://bibliotekanauki.pl/articles/330775.pdf
Data publikacji:
2016
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
computational intelligence
feature selection
neural network
random forest
air pollution forecasting
inteligencja obliczeniowa
selekcja cech
sieć neuronowa
lasy losowe
zanieczyszczenie powietrza
Opis:
The paper discusses methods of data mining for prediction of air pollution. Two tasks in such a problem are important: generation and selection of the prognostic features, and the final prognostic system of the pollution for the next day. An advanced set of features, created on the basis of the atmospheric parameters, is proposed. This set is subject to analysis and selection of the most important features from the prediction point of view. Two methods of feature selection are compared. One applies a genetic algorithm (a global approach), and the other—a linear method of stepwise fit (a locally optimized approach). On the basis of such analysis, two sets of the most predictive features are selected. These sets take part in prediction of the atmospheric pollutants PM10, SO2, NO2 and O3. Two approaches to prediction are compared. In the first one, the features selected are directly applied to the random forest (RF), which forms an ensemble of decision trees. In the second case, intermediate predictors built on the basis of neural networks (the multilayer perceptron, the radial basis function and the support vector machine) are used. They create an ensemble integrated into the final prognosis. The paper shows that preselection of the most important features, cooperating with an ensemble of predictors, allows increasing the forecasting accuracy of atmospheric pollution in a significant way.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2016, 26, 2; 467-478
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A differential evolution approach to dimensionality reduction for classification needs
Autorzy:
Martinović, G.
Bajer, D.
Zorić, B.
Powiązania:
https://bibliotekanauki.pl/articles/331498.pdf
Data publikacji:
2014
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
classification
differential evolution
feature subset selection
k-nearest neighbour algorithm
wrapper method
ewolucja różnicowa
selekcja cech
algorytm najbliższego sąsiada
Opis:
The feature selection problem often occurs in pattern recognition and, more specifically, classification. Although these patterns could contain a large number of features, some of them could prove to be irrelevant, redundant or even detrimental to classification accuracy. Thus, it is important to remove these kinds of features, which in turn leads to problem dimensionality reduction and could eventually improve the classification accuracy. In this paper an approach to dimensionality reduction based on differential evolution which represents a wrapper and explores the solution space is presented. The solutions, subsets of the whole feature set, are evaluated using the k-nearest neighbour algorithm. High quality solutions found during execution of the differential evolution fill the archive. A final solution is obtained by conducting k-fold cross-validation on the archive solutions and selecting the best one. Experimental analysis is conducted on several standard test sets. The classification accuracy of the k-nearest neighbour algorithm using the full feature set and the accuracy of the same algorithm using only the subset provided by the proposed approach and some other optimization algorithms which were used as wrappers are compared. The analysis shows that the proposed approach successfully determines good feature subsets which may increase the classification accuracy.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2014, 24, 1; 111-122
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
    Wyświetlanie 1-14 z 14

    Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies