Wszystkie pola: cluster set - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Two-stage cluster sampling with unequal probability sampling in the first stage and ranked set sampling in the second stage
Autorzy:: Ugwu, Michael C.
Madukaife, Mbanefo S.
Powiązania:: https://bibliotekanauki.pl/articles/2108165.pdf
Data publikacji:: 2022-09-14
Wydawca:: Główny Urząd Statystyczny
Tematy:: cluster sampling
population mean estimator
probability proportional to size sampling
ranked set sampling
relative efficiency
Opis:: In this research work we introduce a new sampling design, namely a two-stage cluster sampling, where probability proportional to size with replacement is used in the first stage unit and ranked set sampling in the second in order to address the issue of marked variability in the sizes of population units concerned with first stage sampling. We obtained an unbiased estimator of the population mean and total, as well as the variance of the mean estimator. We calculated the relative efficiency of the new sampling design to the two-stage cluster sampling with simple random sampling in the first stage and ranked set sampling in the second stage. The results demonstrated that the new sampling design is more efficient than the competing design when a significant variation is observed in the first stage units.
Źródło:: Statistics in Transition new series; 2022, 23, 3; 199-214
1234-7655
Pojawia się w:: Statistics in Transition new series
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Indeks wyboru liczby skupień w zbiorze danych
Index of the Choice of the Number of Clusters
Autorzy:: Korzeniewski, Jerzy
Powiązania:: https://bibliotekanauki.pl/articles/422648.pdf
Data publikacji:: 2014
Wydawca:: Główny Urząd Statystyczny
Tematy:: analiza skupień
liczba skupień w zbiorze danych
indeks Calińskiego-Harabasza
indeks Gap
cluster analysis
number of clusters In a data set
Caliński-Harabasz index
Gap index
Opis:: W artykule zaproponowany jest nowy indeks wyznaczający liczbę skupień w zbiorze danych opisanych przez zmienne ciągłe. Indeks oparty jest na wielostopniowym dzieleniu zbioru danych (lub jego części) na dwa skupienia i sprawdzaniu czy podział taki należy zachować czy pominąć. Kryterium sprawdzającym jest indeks Randa przy pomocy którego oceniana jest zgodność podziału pierwotnego na dwa skupienia z podziałem na dwa skupienia zbioru węższego, składającego się ze skupienia mniejszego z podziału pierwotnego i 1/3 skupienia większego z podziału pierwotnego. Podziały dokonywane są przy pomocy metody k-średnich (dla k=2) z wielokrotnym losowym wyborem punktów startowych. Efektywność nowego indeksu została zbadana w obszernym eksperymencie na kilku tysiącach zbiorów danych wygenerowanych w postaci struktur skupień o różnej liczbie zmiennych, skupień, względnej liczebności skupień i różnych wariantach skorelowania zmiennych wewnątrz skupień. Ponadto, zmienny był również stopień separowalności skupień – kontrolowany według algorytmu OCLUS. Podstawą oceny efektywności było porównanie z dwoma innymi indeksami liczby skupień, mającymi w literaturze przedmiotu opinię jednych z najlepszych spośród dotychczas opracowanych tj. indeksem Calińskiego-Harabasza oraz indeksem Gap. Efektywność zaproponowanego indeksu jest znacznie wyższa od obu konkurencyjnych indeksów w przypadkach niezbyt wyraźnej struktury skupień.
In the article a new index for determining the number of clusters in a data set is proposed. The index is based on multiple division of the data set (or a part of it) into two clusters and checking if this division should be retained or neglected. The checking criterion is the Rand index by means of which the extent to which the primary division and the secondo division of the narrower subset consisting of the smaller cluster from the primary division and 1/3 of the bigger cluster coincide. The divisions are made by means of the classical k-means (for k=2) with multiple random choice of starting points. The efficiency of the new index was examined in a broad experiment on a couple of thousands of data sets generated to possess cluster structures with different number of variables, clusters, cluster densities and different variants of within cluster correlation. Moreover, the cluster overlap controlled according to the OCLUS algorithm was also varied. A basis for efficiency assessment was the comparison with two other leading indices i.e. Caliński-Harabasz index and the Gap index. The efficiency of the new index proposed is higher than that of the competition when the cluster structure is not very distinct.
Źródło:: Przegląd Statystyczny; 2014, 61, 2; 169-180
0033-2372
Pojawia się w:: Przegląd Statystyczny
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "cluster set" wg kryterium: Wszystkie pola

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język