Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "k-means ++" wg kryterium: Temat


Tytuł:
DSMK-means “density-based split-and-Merge K-means clustering algorithm
Autorzy:
Aldahdooh, R. T.
Ashour, W.
Powiązania:
https://bibliotekanauki.pl/articles/91719.pdf
Data publikacji:
2013
Wydawca:
Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:
clustering
K-means
Density-based Split
Merge K-means clustering Algorithm
DSMK-means
clustering algorithm
Opis:
Clustering is widely used to explore and understand large collections of data. K-means clustering method is one of the most popular approaches due to its ease of use and simplicity to implement. This paper introduces Density-based Split- and -Merge K-means clustering Algorithm (DSMK-means), which is developed to address stability problems of standard K-means clustering algorithm, and to improve the performance of clustering when dealing with datasets that contain clusters with different complex shapes and noise or outliers. Based on a set of many experiments, this paper concluded that developed algorithms “DSMK-means” are more capable of finding high accuracy results compared with other algorithms especially as they can process datasets containing clusters with different shapes, densities, or those with outliers and noise.
Źródło:
Journal of Artificial Intelligence and Soft Computing Research; 2013, 3, 1; 51-71
2083-2567
2449-6499
Pojawia się w:
Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Clustering of data represented by pairwise comparisons
Autorzy:
Dvoenko, Sergey
Powiązania:
https://bibliotekanauki.pl/articles/2183479.pdf
Data publikacji:
2022
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
clustering
k-means
distance
similarity
Opis:
In this paper, experimental data, given in the form of pairwise comparisons, such as distances or similarities, are considered. Clustering algorithms for processing such data are developed based on the well-known k-means procedure. Relations to factor analysis are shown. The problems of improving clustering quality and of finding the proper number of clusters in the case of pairwise comparisons are considered. Illustrative examples are provided.
Źródło:
Control and Cybernetics; 2022, 51, 3; 343--387
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
K-means is probabilistically poor
Autorzy:
Kłopotek, Mieczysław
Powiązania:
https://bibliotekanauki.pl/articles/2201613.pdf
Data publikacji:
2022
Wydawca:
Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:
k-means
clustering
probabilistic k-richness
Opis:
Kleinberg introduced the concept of k-richness as a requirement for an algorithm to be a clustering algorithm. The most popular algorithm k means dos not fit this definition because of its probabilistic nature. Hence Ackerman et al. proposed the notion of probabilistic k-richness claiming without proof that k-means has this property. It is proven in this paper, by example, that the version of k-means with random initialization does not have the property probabilistic k-richness, just rebuking Ackeman's claim.
Źródło:
Studia Informatica : systems and information technology; 2022, 2(27); 5--26
1731-2264
Pojawia się w:
Studia Informatica : systems and information technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Geodesic distances for clustering linked text data
Autorzy:
Tekir, S.
Mansmann, F.
Keimer, D.
Powiązania:
https://bibliotekanauki.pl/articles/91737.pdf
Data publikacji:
2012
Wydawca:
Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:
clustering
geodesic distance
text data
k-means algorithm
cosine distance
k-harmonic means
microprecision values
Opis:
The quality of a clustering not only depends on the chosen algorithm and its parameters, but also on the definition of the similarity of two respective objects in a dataset. Applications such as clustering of web documents is traditionally built either on textual similarity measures or on link information. Due to the incompatibility of these two information spaces, combining these two information sources in one distance measure is a challenging issue. In this paper, we thus propose a geodesic distance function that combines traditional similarity measures with link information. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on a subset of the EnglishWikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. Additionally, taking inspiration from the unified view of the performance functions of k-means and k-harmonic means, min and harmonic average of the cosine and geodesic distances are taken in order to construct alternate distance forms. The effectiveness of our approach is measured by computing microprecision values of the clusters based on the provided categorical information of each article.
Źródło:
Journal of Artificial Intelligence and Soft Computing Research; 2012, 2, 3; 247-258
2083-2567
2449-6499
Pojawia się w:
Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The number of clusters in hybrid predictive models: does it really matter?
Autorzy:
Łapczyński, Mariusz
Jefmański, Bartłomiej
Powiązania:
https://bibliotekanauki.pl/articles/1046637.pdf
Data publikacji:
2020
Wydawca:
Główny Urząd Statystyczny
Tematy:
hybrid predictive model
k-means algorithm
decision trees
Opis:
For quite a long time, research studies have attempted to combine various analytical tools to build predictive models. It is possible to combine tools of the same type (ensemble models, committees) or tools of different types (hybrid models). Hybrid models are used in such areas as customer relationship management (CRM), web usage mining, medical sciences, petroleum geology and anomaly detection in computer networks. Our hybrid model was created as a sequential combination of a cluster analysis and decision trees. In the first step of the procedure, objects were grouped into clusters using the k-means algorithm. The second step involved building a decision tree model with a new independent variable that indicated which cluster the objects belonged to. The analysis was based on 14 data sets collected from publicly accessible repositories. The performance of the models was assessed with the use of measures derived from the confusion matrix, including the accuracy, precision, recall, F-measure, and the lift in the first and second decile. We tried to find a relationship between the number of clusters and the quality of hybrid predictive models. According to our knowledge, similar studies have not been conducted yet. Our research demonstrates that in some cases building hybrid models can improve the performance of predictive models. It turned out that the models with the highest performance measures require building a relatively large number of clusters (from 9 to 15).
Źródło:
Przegląd Statystyczny; 2019, 66, 3; 228-238
0033-2372
Pojawia się w:
Przegląd Statystyczny
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Initial Results of Nonhierarchical Cluster Methods Use for Low Flow Grouping
Autorzy:
Cupak, A.
Powiązania:
https://bibliotekanauki.pl/articles/123583.pdf
Data publikacji:
2017
Wydawca:
Polskie Towarzystwo Inżynierii Ekologicznej
Tematy:
low flow
K-means method
nonhierarchical cluster analysis
Opis:
In the paper the possibility of using statistical method for data agglomeration, i.e. nonhierarchical cluster analysis for low flow grouping was made. The study material included daily flows from the multi-year period of 1963–1983 collected for 19 catchments, located in the upper Vistula basin. Regions with the same flow were determined with the use of nonhierarchical cluster analysis (K-means). Groups were characterized by low flow and selected physiographic and meteorological features of the catchments. The procedure of catchments assigning to the clusters was started from two clusters and finished at five. The next moving and assigning of catchments into clusters resulted in a cluster in which there was only one catchment (for five clusters). Another objects’ delineation did not give an objective effects, based on which it was difficult to determine a clear criterion of assigning each catchments into the clusters. The last step involved development of the models reflecting correlation and regression relationships. The identified clusters comprised catchments similar in terms of unit runoff, watercourse length, mean precipitation, median altitude, mean catchment slope, watercourse staff gauge zero, area covered by coniferous forests, arable lands, and soils.
Źródło:
Journal of Ecological Engineering; 2017, 18, 2; 44-50
2299-8993
Pojawia się w:
Journal of Ecological Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Methods for imputation of missing values and their influence on the results of segmentation research
Metody uzupełniania braków danych i ich wpływ na wyniki badań segmentacyjnych.
Autorzy:
Gąsior, Marcin
Skowron, Łukasz
Powiązania:
https://bibliotekanauki.pl/articles/425241.pdf
Data publikacji:
2016
Wydawca:
Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:
missing values
cluster analysis
k-means algorithm
k-medoids algorithm
Opis:
The lack of answers is a common problem in all types of research, especially in the field of social sciences. Hence a number of solutions were developed, including the analysis of complete cases or imputations that supplement the missing value with a value calculated according to different algorithms. This paper evaluates the influence of the adopted method for the supplementation of missing answers regarding the result of segmentation conducted with the use of cluster analysis. In order to achieve this we used a set of data from an actual consumer research in which the cases with missing values were deleted or supplemented with the use of various methods. Cluster analyses were then performed on those sets of data, both with the assumption of ordinal and ratio level of measurement, and then the grouping quality, as expressed by different indicators, was evaluated. This research proved the advantage of imputation over the analysis of complete cases, it also proved the validity of using more complex approaches than the simple supplementation with an average or median value.
Źródło:
Econometrics. Ekonometria. Advances in Applied Data Analytics; 2016, 4 (54); 61-71
1507-3866
Pojawia się w:
Econometrics. Ekonometria. Advances in Applied Data Analytics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Development of Data-mining Technique for Seismic Vulnerability Assessment
Autorzy:
Wojcik, Waldemar
Karmenova, Markhaba
Smailova, Saule
Tlebaldinova, Aizhan
Belbeubaev, Alisher
Powiązania:
https://bibliotekanauki.pl/articles/1844631.pdf
Data publikacji:
2021
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
data analysis
seismic assessment
clustering
h-means
k-means
random forest
Opis:
Assessment of seismic vulnerability of urban infrastructure is an actual problem, since the damage caused by earthquakes is quite significant. Despite the complexity of such tasks, today’s machine learning methods allow the use of “fast” methods for assessing seismic vulnerability. The article proposes a methodology for assessing the characteristics of typical urban objects that affect their seismic resistance; using classification and clustering methods. For the analysis, we use kmeans and hkmeans clustering methods, where the Euclidean distance is used as a measure of proximity. The optimal number of clusters is determined using the Elbow method. A decision-making model on the seismic resistance of an urban object is presented, also the most important variables that have the greatest impact on the seismic resistance of an urban object are identified. The study shows that the results of clustering coincide with expert estimates, and the characteristic of typical urban objects can be determined as a result of data modeling using clustering algorithms.
Źródło:
International Journal of Electronics and Telecommunications; 2021, 67, 2; 261-266
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Data Mining Application in Air Transportation – the Case of Turkish Airlines
Autorzy:
Pisarek, Renata
Akpinar, Musab Talha
Hızıroglu, Abdulkadir
Powiązania:
https://bibliotekanauki.pl/articles/504638.pdf
Data publikacji:
2017
Wydawca:
Międzynarodowa Wyższa Szkoła Logistyki i Transportu
Tematy:
data mining
K-means
airlines
air transport
Turkish Airlines
Opis:
The paper presents an exemplification of data mining techniques in aviation industry on the basis of Turkish Airlines. The purpose of the paper is to present application of data mining on the selected operational data, concerning international flight passenger baggage data, in year 2015. The differences in passenger and flight profiles have been examined. Firstly, two-steps approach allowed defining the number of clusters. Secondly, K-means clustering were applied to divide data into a certain number of clusters representing the different areas of consumption. Results can contribute to higher efficiency in decision making regarding destination offer and fleet management.
Źródło:
Logistics and Transport; 2017, 36, 4; 79-88
1734-2015
Pojawia się w:
Logistics and Transport
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A proposal of a new method of choosing starting points for k-means grouping
Propozycja nowej metody wyboru punktów startowych do grupowania metodą k-średnich
Autorzy:
Korzeniewski, Jerzy
Powiązania:
https://bibliotekanauki.pl/articles/907035.pdf
Data publikacji:
2008
Wydawca:
Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:
cluster analysis
starting points
silhouette indices
k-means method
Opis:
When one groups set elements with the help of k-means it is crucial to choose starting points properly. If they are chosen incorrectly one may arrive at badly grouped elements. In the paper a new method of choosing starting points is proposed. It is based on the distance matrix only. Starting points are chosen so as to improve the classical method of choosing points which are as far from one another as possible. The quality of grouping is assessed by means of silhouette indices — it is compared with the quality of grouping done with randomly chosen starting points and with maximum distance interval method. Sets from Euclidean spaces are generated with the help of CLUSTGEN software written by J. Milligana.
Gdy grupujemy punkty zbioru metodą k-średnich to zasadniczym problemem jest właściwy wybór punktów startowych. Jeśli są one źle wybrane to grupowanie może być złe. W artykule zaproponowana jest nowa metoda wyboru punktów startowych. Metoda ta jest oparta wyłącznie na znajomości macierzy odległości. Punkty startowe są wybierane tak, by poprawić wybór, który otrzymamy przy pomocy metody klasycznej polegającej na wyborze punktów możliwie jak najbardziej od siebie oddalonych. Jakość grupowania jest oceniana przy pomocy indeksów sylwetkowych - porównywana jest z jakością grupowania otrzymanego przy losowym wyborze punktów startowych oraz przy wyborze metodą klasyczną. Zbiory z przestrzeni euklidesowych są generowane przy pomocy programu CLUSTGEN autorstwa J. Milligana.
Źródło:
Acta Universitatis Lodziensis. Folia Oeconomica; 2008, 216
0208-6018
2353-7663
Pojawia się w:
Acta Universitatis Lodziensis. Folia Oeconomica
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Implementation of Big Data Concept for Variability Mapping Control of Financing Assessment of Informal Sector Workers in Bogor City
Autorzy:
Salmah, Salmah
Andria, Fredi
Wahyudin, Irfan
Powiązania:
https://bibliotekanauki.pl/articles/1065325.pdf
Data publikacji:
2019
Wydawca:
Przedsiębiorstwo Wydawnictw Naukowych Darwin / Scientific Publishing House DARWIN
Tematy:
Big Data
Cluster
Informal Worker Sector
K-Means Clustering
Opis:
At present risks and uncertainties occur in protecting health for the community. This requires a national health insurance program that can guarantee health care costs. One of the program participants is a resident who works in the informal sector. This group is vulnerable as well as the potential for the implementation of health insurance programs. However, the level of participation of informal sector workers is still low, so an analysis of the constraints affecting it is needed. This study aims to identify categories of informal sector workers and analyze various obstacles faced by informal sector workers to become health insurance participants in the city of Bogor. The method used is the concept of big data with K-means clustering data mining techniques to group informal sector workers along with the constraints that exist in each of these groups. The results showed that there were 3 clusters with very low Social Security Administrator (BPJS) health ownership, namely cluster 1, cluster 3, and cluster 5. Each cluster had different constraints. Cluster 1 has constraints on the number of dependents it has, Cluster 3 has constraints on the gender side that are dominated by women, while Cluster 5 has constraints on the low-income side. Each cluster has a different obstacle resolution recommendation, namely for cluster 1 by registering workers in JKN contribution recipient (PBI) participants, cluster 2 by giving outreach to women who have only focused on men, and for clusters 5 by involving the community as a forum for the empowerment of informal sector workers.
Źródło:
World Scientific News; 2019, 135; 261-282
2392-2192
Pojawia się w:
World Scientific News
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Extending k-means with the description comes first approach
Autorzy:
Stefanowski, J.
Weiss, D.
Powiązania:
https://bibliotekanauki.pl/articles/970926.pdf
Data publikacji:
2007
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
document clustering
cluster labels
k-means algorithm
information retrieval
Opis:
This paper describes a technique for clustering large collections of short and medium length text documents such as press articles, news stories and the like. The technique called description comes first (DCF) consists of identification of related document clusters, selection of salient phrases relevant to these clusters and reallocation of documents matching the selected phrases to form final document groups. The advantages of this technique include more comprehensive cluster labels and clearer (more transparent) relationship between cluster labels and their content. We demonstrate the DCF by taking a standard k-means algorithm as a baseline and weaving DCF elements into it; the outcome is the descriptive k-means (DKM) algorithm. The paper goes through technical background explaining how to implement DKM efficiently and ends with the description of an experiment measuring clustering quality on a benchmark document collection 20-newsgroups. Short fragments of this paper appeared at the poster session of the RIAO 2007 conference, Pittsburgh, PA, USA (electronic proceedings only).
Źródło:
Control and Cybernetics; 2007, 36, 4; 1009-1035
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
An Efficient Controller Placement Algorithm using Clustering in Software Defined Networks
Autorzy:
Jacob, Joshua
Shinde, Sumedha
Narayan, D. G.
Powiązania:
https://bibliotekanauki.pl/articles/27312951.pdf
Data publikacji:
2023
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
clustering
controller placement
PAM
K-means++
silhouette score
SDN
Opis:
Software defined networking (SDN) is an emerging network paradigm that separates the control plane from data plane and ensures programmable network management. In SDN, the control plane is responsible for decision-making, while packet forwarding is handled by the data plane based on flow entries defined by the control plane. The placement of controllers is an important research issue that significantly impacts the performance of SDN. In this work, we utilize clustering techniques to group networks into multiple clusters and propose an algorithm for optimal controller placement within each cluster. The evaluation involves the use of the Mininet emulator with POX as the SDN controller. By employing the silhouette score, we determine the optimal number of controllers for various topologies. Additionally, to enhance network performance, we employ the meeting point algorithm to calculate the best location for placing the controller within each cluster. The proposed approach is compared with existing works in terms of throughput, delay, and jitter using six topologies from the Internet Zoo dataset.
Źródło:
Journal of Telecommunications and Information Technology; 2023, 4; 9--17
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
An application of machine learning methods to cutting tool path clustering and rul estimation in machining
Autorzy:
Zegarra, Fabio C.
Vargas-Machuca, Juan
Roman-Gonzalez, Avid
Coronado, Alberto M.
Powiązania:
https://bibliotekanauki.pl/articles/28407324.pdf
Data publikacji:
2023
Wydawca:
Wrocławska Rada Federacji Stowarzyszeń Naukowo-Technicznych
Tematy:
feature extraction
k-means clustering
time series
unsupervised learning
Opis:
Machine learning has been widely used in manufacturing, leading to significant advances in diverse problems, including the prediction of wear and remaining useful life (RUL) of machine tools. However, the data used in many cases correspond to simple and stable processes that differ from practical applications. In this work, a novel dataset consisting of eight cutting tools with complex tool paths is used. The time series of the tool paths, corresponding to the three-dimensional position of the cutting tool, are grouped according to their shape. Three unsupervised clustering techniques are applied, resulting in the identification of DBA-k-means as the most appropriate technique for this case. The clustering process helps to identify training and testing data with similar tool paths, which is then applied to build a simple two-feature prediction model with the same level of precision for RUL prediction as a more complex four-feature prediction model. This work demonstrates that by properly selecting the methodology and number of clusters, tool paths can be effectively classified, which can later be used in prediction problems in more complex settings.
Źródło:
Journal of Machine Engineering; 2023, 23, 4; 5--17
1895-7595
2391-8071
Pojawia się w:
Journal of Machine Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
K-Means and Fuzzy based Hybrid Clustering Algorithm for WSN
Autorzy:
Angadi, Basavaraj M.
Kakkasageri, Mahabaleshwar S.
Powiązania:
https://bibliotekanauki.pl/articles/27311955.pdf
Data publikacji:
2023
Wydawca:
Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:
wireless sensor networks
cluster
K-Means algorithm
fuzzy logic
Opis:
Wireless Sensor Networks (WSN) acquired a lot of attention due to their widespread use in monitoring hostile environments, critical surveillance and security applications. In these applications, usage of wireless terminals also has grown significantly. Grouping of Sensor Nodes (SN) is called clustering and these sensor nodes are burdened by the exchange of messages caused due to successive and recurring re-clustering, which results in power loss. Since most of the SNs are fitted with nonrechargeable batteries, currently researchers have been concentrating their efforts on enhancing the longevity of these nodes. For battery constrained WSN concerns, the clustering mechanism has emerged as a desirable subject since it is predominantly good at conserving the resources especially energy for network activities. This proposed work addresses the problem of load balancing and Cluster Head (CH) selection in cluster with minimum energy expenditure. So here, we propose hybrid method in which cluster formation is done using unsupervised machine learning based kmeans algorithm and Fuzzy-logic approach for CH selection.
Źródło:
International Journal of Electronics and Telecommunications; 2023, 69, 4; 793--801
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies