- Tytuł:
- An alternative extension of the k-means algorithm for clustering categorical data
- Autorzy:
-
San, O. M.
Huynh, V. N.
Nakamori, Y. - Powiązania:
- https://bibliotekanauki.pl/articles/907406.pdf
- Data publikacji:
- 2004
- Wydawca:
- Uniwersytet Zielonogórski. Oficyna Wydawnicza
- Tematy:
-
analiza skupień
dane kategoryczne
eksploracja danych
cluster analysis
categorical data
data mining - Opis:
- Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of "cluster centers'' on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, soybean disease and nursery databases.
- Źródło:
-
International Journal of Applied Mathematics and Computer Science; 2004, 14, 2; 241-247
1641-876X
2083-8492 - Pojawia się w:
- International Journal of Applied Mathematics and Computer Science
- Dostawca treści:
- Biblioteka Nauki