Temat: MapReduce - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: A survey of big data classification strategies
Autorzy:: Banchhor, Chitrakant
Srinivasu, N.
Powiązania:: https://bibliotekanauki.pl/articles/2050171.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: big data
data mining
MapReduce
classification
machine learning
evolutionary intelligence
deep learning
Opis:: Big data plays nowadays a major role in finance, industry, medicine, and various other fields. In this survey, 50 research papers are reviewed regarding different big data classification techniques presented and/or used in the respective studies. The classification techniques are categorized into machine learning, evolutionary intelligence, fuzzy-based approaches, deep learning and so on. The research gaps and the challenges of the big data classification, faced by the existing techniques are also listed and described, which should help the researchers in enhancing the effectiveness of their future works. The research papers are analyzed for different techniques with respect to software tools, datasets used, publication year, classification techniques, and the performance metrics. It can be concluded from the here presented survey that the most frequently used big data classification methods are based on the machine learning techniques and the apparently most commonly used dataset for big data classification is the UCI repository dataset. The most frequently used performance metrics are accuracy and execution time.
Źródło:: Control and Cybernetics; 2020, 49, 4; 447-469
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Big problems with big data
Autorzy:: Goczyła, Krzysztof
Powiązania:: https://bibliotekanauki.pl/articles/1954610.pdf
Data publikacji:: 2020
Wydawca:: Politechnika Gdańska
Tematy:: big data
MapReduce
NoSQL database
data science
baza danych NoSQL
nauka o danych
Opis:: The article presents an overview of the most important issues related to the phenomenon called big data. The characteristics of big data concerning the data itself and the data sources are presented. Then, the big data life cycle concept is formulated. The next sections focus on two big data technologies: MapReduce for big data processing and NoSQL databases for big data storage.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2020, 24, 1; 73-81
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Interpretable decision-tree induction in a big data parallel framework
Autorzy:: Weinberg, A. I.
Last, M.
Powiązania:: https://bibliotekanauki.pl/articles/330635.pdf
Data publikacji:: 2017
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: big data
parallel computing
mapreduce
decision trees
editing distance
tree similarity
zbiór danych
obliczenia równoległe
drzewa decyzyjne
odległość edycji
Opis:: When running data-mining algorithms on big data platforms, a parallel, distributed framework, such as MAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 737-748
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "MapReduce" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język