Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "Document Clustering" wg kryterium: Temat


Wyświetlanie 1-7 z 7
Tytuł:
A document clustering method based on ant algorithms
Autorzy:
Machnik, Ł.
Powiązania:
https://bibliotekanauki.pl/articles/1943269.pdf
Data publikacji:
2007
Wydawca:
Politechnika Gdańska
Tematy:
ant algorithms
ant systems
document clustering
document grouping
Opis:
Ant Algorithms, particularly the Ant Colony Optimization (ACO) metaheuristic, are universal, flexible and scalable because they are based on multi-agent cooperation. The increased demand for effective methods of managing large collections of documents is a sufficient stimulus to place the research on new applications of ant-based systems in the area of text document processing. The author presents an implementation of such a technique in the area of document clustering. Details of the ACO document clustering method and results of experiments are presented.
Źródło:
TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2007, 11, 1-2; 87-102
1428-6394
Pojawia się w:
TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Multiplicative Algorithm for Correntropy-Based Nonnegative Matrix Factorization
Autorzy:
Hosseini-Asl, E.
Zurada, J. M.
Powiązania:
https://bibliotekanauki.pl/articles/108758.pdf
Data publikacji:
2013
Wydawca:
Społeczna Akademia Nauk w Łodzi
Tematy:
Nonnegative Matrix Factorization (NMF)
Correntropy
Multiplicative Algorithm
Document Clustering
Opis:
Nonnegative matrix factorization (NMF) is a popular dimension reduction technique used for clustering by extracting latent features from highdimensional data and is widely used for text mining. Several optimization algorithms have been developed for NMF with different cost functions. In this paper we evaluate the correntropy similarity cost function. Correntropy is a nonlinear localized similarity measure which measures the similarity between two random variables using entropy-based criterion, and is especially robust to outliers. Some algorithms based on gradient descent have been used for correntropy cost function, but their convergence is highly dependent on proper initialization and step size and other parameter selection. The proposed general multiplicative factorization algorithm uses the gradient descent algorithm with adaptive step size to maximize the correntropy similarity between the data matrix and its factorization. After devising the algorithm, its performance has been evaluated for document clustering. Results were compared with constrained gradient descent method using steepest descent and L-BFGS methods. The simulations show that the performance of steepest descent and LBFGS convergence are highly dependent on gradient descent step size which depends on σ parameter of correntropy cost function. However, the multiplicative algorithm is shown to be less sensitive to σ parameterand yields better clustering results than other algorithms. The results demonstrate that clustering performance measured by entropy and purity improve the clustering. The multiplicative correntropy-based algorithm also shows less variation in accuracy of document clusters for variable number of clusters. The convergence of each algorithm is also investigated, and the experiments have shown that the multiplicative algorithm converges faster than L-BFGS and steepest descent method.
Źródło:
Journal of Applied Computer Science Methods; 2013, 5 No. 2; 89-104
1689-9636
Pojawia się w:
Journal of Applied Computer Science Methods
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Extending k-means with the description comes first approach
Autorzy:
Stefanowski, J.
Weiss, D.
Powiązania:
https://bibliotekanauki.pl/articles/970926.pdf
Data publikacji:
2007
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
document clustering
cluster labels
k-means algorithm
information retrieval
Opis:
This paper describes a technique for clustering large collections of short and medium length text documents such as press articles, news stories and the like. The technique called description comes first (DCF) consists of identification of related document clusters, selection of salient phrases relevant to these clusters and reallocation of documents matching the selected phrases to form final document groups. The advantages of this technique include more comprehensive cluster labels and clearer (more transparent) relationship between cluster labels and their content. We demonstrate the DCF by taking a standard k-means algorithm as a baseline and weaving DCF elements into it; the outcome is the descriptive k-means (DKM) algorithm. The paper goes through technical background explaining how to implement DKM efficiently and ends with the description of an experiment measuring clustering quality on a benchmark document collection 20-newsgroups. Short fragments of this paper appeared at the poster session of the RIAO 2007 conference, Pittsburgh, PA, USA (electronic proceedings only).
Źródło:
Control and Cybernetics; 2007, 36, 4; 1009-1035
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Document Clustering : Concepts, Metrics and Algorithms
Autorzy:
Tarczynski, T.
Powiązania:
https://bibliotekanauki.pl/articles/226231.pdf
Data publikacji:
2011
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
document clustering
text mining
k-means
hierarchical clustersting
vector space model
Opis:
Document clustering, which is also refered to as text clustering, is a technique of unsupervised document organisation. Text clustering is used to group documents into subsets that consist of texts that are similar to each orher. These subsets are called clusters. Document clustering algorithms are widely used in web searching engines to produce results relevant to a query. An example of practical use of those techniques are Yahoo! hierarchies of documents [1]. Another application of document clustering is browsing which is defined as searching session without well specific goal. The browsing techniques heavily relies on document clustering. In this article we examine the most important concepts related to document clustering. Besides the algorithms we present comprehensive discussion about representation of documents, calculation of similarity between documents and evaluation of clusters quality.
Źródło:
International Journal of Electronics and Telecommunications; 2011, 57, 3; 271-277
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Extraction of Polish noun senses from large corpora by means of clustering
Autorzy:
Broda, B.
Piasecki, M.
Szpakowicz, S.
Powiązania:
https://bibliotekanauki.pl/articles/969804.pdf
Data publikacji:
2010
Wydawca:
Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:
corpus linguistics
semantic similarity
Polish nouns
word clustering
Clustering by Committee
co-occurrence retrieval models
rank weight function
Polish WordNet
WordNet-based synonymy test
document clustering
keywords extraction
Opis:
We investigate two methods of identifying noun senses, based on clustering of lemmas and of documents. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on very large Polish corpora. The evaluation by means of a WordNet-based synonymy test used Polish wordnet (plWordNet 1.0). Various clustering algorithms were analysed for the needs of extraction of document clusters as indicators of the senses of words which occur in them. The two approaches to wordsense identification have been compared, and conclusions drawn.
Źródło:
Control and Cybernetics; 2010, 39, 2; 401-420
0324-8569
Pojawia się w:
Control and Cybernetics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Benchmarking high performance architectures with natural language processing algorithms
Benchmarking architektur wysokiej wydajności algorytmami przetwarzania języka naturalnego
Autorzy:
Kuta, M.
Kitowski, J.
Powiązania:
https://bibliotekanauki.pl/articles/305469.pdf
Data publikacji:
2011
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
benchmarking
tagowanie częściami mowy
klasteryzacja dokumentów
przetwarzanie języka naturalnego
architektury wysokiej wydajności
part-of-speech tagging
document clustering
natural language processing
high performance architectures
Opis:
Natural Language Processing algorithms are resource demanding, especially when tuning to inflective language like Polish is needed. The paper presents time and memory requirements of part of speech tagging and clustering algorithms applied to two corpora of the Polish language. The algorithms are benchmarked on three high performance platforms of different architectures. Additionally sequential versions and OpenMP implementations of clustering algorithms were compared.
Algorytmy przetwarzania języka naturalnego mają duże zapotrzebowanie na zasoby komputerowe, szczególnie gdy wymagane jest dostosowanie algorytmu do języka fleksyjnego jakim jest np. język polski. Artykuł przedstawia wymagania czasowe i pamięciowe algorytmów tagowania częściami mowy oraz algorytmów klasteryzacji zastosowanych do dwóch korpusów języka polskiego. Dokonano benchmarkingu algorytmów na trzech platformach wysokiej wydajności reprezentujących różne architektury. Dodatkowo porównano wersję sekwencyjną oraz implementacje OpenMP algorytmów klasteryzacji.
Źródło:
Computer Science; 2011, 12; 19-31
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Analysis of methods and means of text mining
Autorzy:
Rybchak, Z.
Basystiuk, O.
Powiązania:
https://bibliotekanauki.pl/articles/411072.pdf
Data publikacji:
2017
Wydawca:
Polska Akademia Nauk. Oddział w Lublinie PAN
Tematy:
text mining
text analytics
data analysis
high-quality information
text categorization
text clustering
document summarization
sentiment analysis
sieć językowa
analiza tekstu
analiza danych
wysoka jakość informacji
klasyfikacja tekstowa
kategoryzacja tekstowa
grupowanie tekstu
streszczenie dokumentów tekstowych
technika sentiment analysis
Opis:
In Big Data era when data volume doubled every year analyzing of all this data become really complicated task, so in this case text mining systems, techniques and tools become main instrument of analyzing tones and tones of information, selecting that information that suit the best for your needs and just help save your time for more interesting thing. The main aims of this article are explain basic principles of this field and overview some interesting technologies that nowadays are widely used in text mining.
Źródło:
ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes; 2017, 6, 2; 73-78
2084-5715
Pojawia się w:
ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes
Dostawca treści:
Biblioteka Nauki
Artykuł
    Wyświetlanie 1-7 z 7

    Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies