Temat: MapReduce - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Mapreduce and semantics enabled event detection using social media
Autorzy:: Yan, P.
Powiązania:: https://bibliotekanauki.pl/articles/91751.pdf
Data publikacji:: 2017
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: event detection
social media
semantic relatedness
MapReduce
Opis:: Social media is playing an increasingly important role in reporting major events happening in the world. However, detecting events from social media is challenging due to the huge magnitude of the data and the complex semantics of the language being processed. This paper proposes MASEED (MapReduce and Semantics Enabled Event Detection), a novel event detection framework that effectively addresses the following problems: 1) traditional data mining paradigms cannot work for big data; 2) data preprocessing requires significant human efforts; 3) domain knowledge must be gained before the detection; 4) semantic interpretation of events is overlooked; 5) detection scenarios are limited to specific domains. In this work, we overcome these challenges by embedding semantic analysis into temporal analysis for capturing the salient aspects of social media data, and parallelizing the detection of potential events using the MapReduce methodology. We evaluate the performance of our method using real Twitter data. The results will demonstrate the proposed system outperforms most of the state-of-the-art methods in terms of accuracy and efficiency.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2017, 7, 3; 201-213
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Interpretable decision-tree induction in a big data parallel framework
Autorzy:: Weinberg, A. I.
Last, M.
Powiązania:: https://bibliotekanauki.pl/articles/330635.pdf
Data publikacji:: 2017
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: big data
parallel computing
mapreduce
decision trees
editing distance
tree similarity
zbiór danych
obliczenia równoległe
drzewa decyzyjne
odległość edycji
Opis:: When running data-mining algorithms on big data platforms, a parallel, distributed framework, such as MAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 737-748
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Big Data – znaczenie, zastosowania i rozwiązania technologiczne
Big Data – meaning, applications and technology solutions
Autorzy:: Racka, Katarzyna
Powiązania:: https://bibliotekanauki.pl/articles/446789.pdf
Data publikacji:: 2016
Wydawca:: Mazowiecka Uczelnia Publiczna w Płocku
Tematy:: Big Data
NoSQL
MapReduce
Hadoop
Opis:: Big Data technologies and their application to business processes is growing rapidly. Analytical and consulting enterprises specializing in issues of strategic use of IT technology indicate that the number of companies implementing or planning to implement technological solutions related to Big Data is increasing annually. A lot of companies believe that the analysis of unstructured data will be the key to a deeper understanding of customer behavior. They believe that the analyst is absolutely essential or very important to conduct the overall business strategy and improve operational results. The purpose of the article is to define Big Data, explain what the unstructured data are and how to apply them. Furthermore, in the article I present the results of reports on the Big Data technologies implementation and discuss the associated technologies.
Technologie Big Data i ich zastosowanie do procesów biznesowych rozwijają się w tempie dynamicznym. Przedsiębiorstwa analityczno-doradcze specjalizujące się w zagadnieniach strategicznego wykorzystania technologii IT informują, że z roku na rok zwiększa się liczba przedsiębiorstw wdrażających lub planujących wdrożenie rozwiązań technologicznych związanych z Big Data. Dużo przedsiębiorstw uważa, że analizy danych niestrukturalnych będą kluczem do głębszego zrozumienia zachowań klienta. Uważają one, że analityka jest absolutnie niezbędna lub bardzo ważna dla prowadzenia ogólnej strategii biznesowej przedsiębiorstwa oraz do poprawy wyników operacyjnych. Celem tego artykułu jest wyjaśnienie co dokładnie oznacza pojęcie Big Data, co to są dane niestrukturalne oraz jakie mogą mieć zastosowania. Ponadto, w artykule prezentuję wyniki raportów dotyczących wdrażanie technologii Big Data i omawiam przykładowe technologie związane z Big Data.
Źródło:: Zeszyty Naukowe PWSZ w Płocku. Nauki Ekonomiczne; 2016, 1(23); 311 - 323
1644-888X
Pojawia się w:: Zeszyty Naukowe PWSZ w Płocku. Nauki Ekonomiczne
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Massive simulations using MapReduce model
Model MapReduce w wielokrotnych obliczeniach numerycznych
Autorzy:: Krupa, A.
Sawicki, B.
Powiązania:: https://bibliotekanauki.pl/articles/952714.pdf
Data publikacji:: 2015
Wydawca:: Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:: mapreduce
cloud computing
platform performance
hadoop
chmura obliczeniowa
wydajność platformy
Opis:: In the last few years cloud computing is growing as a dominant solution for large scale numerical problems. It is based on MapReduce programming model, which provides high scalability and flexibility, but also optimizes costs of computing infrastructure. This paper studies feasibility of MapReduce model for scientific problems consisting of many independent simulations. Experiment based on variability analysis for simple electromagnetic problem with over 10,000 scenarios proves that platform has nearly linear scalability with over 80% of theoretical maximum performance.
W ostatnich latach chmury obliczeniowe stały się dominującym rozwiązaniem używanym do wielkoskalowych obliczeń numerycznych. Najczęściej są one oparte o programistyczny model MapReduce, który zapewnia wysoką skalowalność, elastyczność, oraz optymalizację kosztów infrastruktury. Artykuł w analityczny sposób przedstawia wykorzystanie MapReduce w rozwiązywaniu problemów naukowych złożonych z wielu niezależnych symulacji. Przeprowadzony eksperyment, złożony z ponad 10 000 przypadków, oparty o analizę zmienności pola elektromagnetycznego pokazuje niemal liniową skalowalność platformy i jej ponad 80% wydajności w stosunku do teoretycznego maksimum.
Źródło:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2015, 4; 45-47
2083-0157
2391-6761
Pojawia się w:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Performance evaluation of MapReduce using full virtualisation on a departmental cloud
Autorzy:: González-Vélez, H.
Kontagora, M.
Powiązania:: https://bibliotekanauki.pl/articles/907802.pdf
Data publikacji:: 2011
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: przetwarzanie w chmurze
przetwarzanie równoległe
szkielet algorytmiczny
MapReduce
server virtualization
cloud computing
algorithmic skeletons
structured parallelism
parallel computing
Opis:: This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1+16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced when using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2011, 21, 2; 275-284
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Big problems with big data
Autorzy:: Goczyła, Krzysztof
Powiązania:: https://bibliotekanauki.pl/articles/1954610.pdf
Data publikacji:: 2020
Wydawca:: Politechnika Gdańska
Tematy:: big data
MapReduce
NoSQL database
data science
baza danych NoSQL
nauka o danych
Opis:: The article presents an overview of the most important issues related to the phenomenon called big data. The characteristics of big data concerning the data itself and the data sources are presented. Then, the big data life cycle concept is formulated. The next sections focus on two big data technologies: MapReduce for big data processing and NoSQL databases for big data storage.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2020, 24, 1; 73-81
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: High Frequency Rule Synthesis in a Large Scale Multiple Database with MapReduce
Autorzy:: Bisoyi, Sudhanshu Shekhar
Mishra, Pragnyaban
Mishra, Saroja Nanda
Powiązania:: https://bibliotekanauki.pl/articles/2055260.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: multiple database
frequent itemset
association rule
rule synthesis
MapReduce
HDFS
Opis:: Increasing development in information and communication technology leads to the generation of large amount of data from various sources. These collected data from multiple sources grows exponentially and may not be structurally uniform. In general, these are heterogeneous and distributed in multiple databases. Because of large volume, high velocity and variety of data mining knowledge in this environment becomes a big data challenge. Distributed Association Rule Mining(DARM) in these circumstances becomes a tedious task for an effective global Decision Support System(DSS). The DARM algorithms generate a large number of association rules and frequent itemset in the big data environment. In this situation synthesizing highfrequency rules from the big database becomes more challenging. Many algorithms for synthesizing association rule have been proposed in multiple database mining environments. These are facing enormous challenges in terms of high availability, scalability, efficiency, high cost for the storage and processing of large intermediate results and multiple redundant rules. In this paper, we have proposed a model to collect data from multiple sources into a big data storage framework based on HDFS. Secondly, a weighted multi-partitioned method for synthesizing high-frequency rules using MapReduce programming paradigm has been proposed. Experiments have been conducted in a parallel and distributed environment by using commodity hardware. We ensure the efficiency, scalability, high availability and costeffectiveness of our proposed method.
Źródło:: International Journal of Electronics and Telecommunications; 2022, 68, 2; 177--186
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: A survey of big data classification strategies
Autorzy:: Banchhor, Chitrakant
Srinivasu, N.
Powiązania:: https://bibliotekanauki.pl/articles/2050171.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: big data
data mining
MapReduce
classification
machine learning
evolutionary intelligence
deep learning
Opis:: Big data plays nowadays a major role in finance, industry, medicine, and various other fields. In this survey, 50 research papers are reviewed regarding different big data classification techniques presented and/or used in the respective studies. The classification techniques are categorized into machine learning, evolutionary intelligence, fuzzy-based approaches, deep learning and so on. The research gaps and the challenges of the big data classification, faced by the existing techniques are also listed and described, which should help the researchers in enhancing the effectiveness of their future works. The research papers are analyzed for different techniques with respect to software tools, datasets used, publication year, classification techniques, and the performance metrics. It can be concluded from the here presented survey that the most frequently used big data classification methods are based on the machine learning techniques and the apparently most commonly used dataset for big data classification is the UCI repository dataset. The most frequently used performance metrics are accuracy and execution time.
Źródło:: Control and Cybernetics; 2020, 49, 4; 447-469
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "MapReduce" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język