- Tytuł:
- Ensemble of data mining methods for gene ranking
- Autorzy:
-
Wiliński, A.
Osowski, S. - Powiązania:
- https://bibliotekanauki.pl/articles/201570.pdf
- Data publikacji:
- 2012
- Wydawca:
- Polska Akademia Nauk. Czytelnia Czasopism PAN
- Tematy:
-
gene expression array
feature selection
gene ranking methods
classification
SVM - Opis:
- The paper presents the ensemble of data mining methods for discovering the most important genes and gene sequences generated by the gene expression arrays, responsible for the recognition of a particular type of cancer. The analyzed methods include the correlation of the feature with a class, application of the statistical hypotheses, the Fisher measure of discrimination and application of the linear Support Vector Machine for characterization of the discrimination ability of the features. In the first step of ranking we apply each method individually, choosing the genes most often selected in the cross validation of the available data set. In the next step we combine the results of different selection methods together and once again choose the genes most frequently appearing in the selected sets. On the basis of this we form the final ranking of the genes. The most important genes form the input information delivered to the Support Vector Machine (SVM) classifier, responsible for the final recognition of tumor from non-tumor data. Different forms of checking the correctness of the proposed ranking procedure have been applied. The first one is relied on mapping the distribution of selected genes on the two-coordinate system formed by two most important principal components of the PCA transformation and applying the cluster quality measures. The other one depicts the results in the graphical form by presenting the gene expressions in the form of pixel intensity for the available data. The final confirmation of the quality of the proposed ranking method are the classification results of recognition of the cancer cases from the non-cancer (normal) ones, performed using the Gaussian kernel SVM. The results of selection of the most significant genes used by the SVM for recognition of the prostate cancer cases from normal cases have confirmed a good accuracy of results. The presented methodology is of potential use for practical application in bioinformatics.
- Źródło:
-
Bulletin of the Polish Academy of Sciences. Technical Sciences; 2012, 60, 3; 461-470
0239-7528 - Pojawia się w:
- Bulletin of the Polish Academy of Sciences. Technical Sciences
- Dostawca treści:
- Biblioteka Nauki