Temat: Hadoop - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Big Data – znaczenie, zastosowania i rozwiązania technologiczne
Big Data – meaning, applications and technology solutions
Autorzy:: Racka, Katarzyna
Powiązania:: https://bibliotekanauki.pl/articles/446789.pdf
Data publikacji:: 2016
Wydawca:: Mazowiecka Uczelnia Publiczna w Płocku
Tematy:: Big Data
NoSQL
MapReduce
Hadoop
Opis:: Big Data technologies and their application to business processes is growing rapidly. Analytical and consulting enterprises specializing in issues of strategic use of IT technology indicate that the number of companies implementing or planning to implement technological solutions related to Big Data is increasing annually. A lot of companies believe that the analysis of unstructured data will be the key to a deeper understanding of customer behavior. They believe that the analyst is absolutely essential or very important to conduct the overall business strategy and improve operational results. The purpose of the article is to define Big Data, explain what the unstructured data are and how to apply them. Furthermore, in the article I present the results of reports on the Big Data technologies implementation and discuss the associated technologies.
Technologie Big Data i ich zastosowanie do procesów biznesowych rozwijają się w tempie dynamicznym. Przedsiębiorstwa analityczno-doradcze specjalizujące się w zagadnieniach strategicznego wykorzystania technologii IT informują, że z roku na rok zwiększa się liczba przedsiębiorstw wdrażających lub planujących wdrożenie rozwiązań technologicznych związanych z Big Data. Dużo przedsiębiorstw uważa, że analizy danych niestrukturalnych będą kluczem do głębszego zrozumienia zachowań klienta. Uważają one, że analityka jest absolutnie niezbędna lub bardzo ważna dla prowadzenia ogólnej strategii biznesowej przedsiębiorstwa oraz do poprawy wyników operacyjnych. Celem tego artykułu jest wyjaśnienie co dokładnie oznacza pojęcie Big Data, co to są dane niestrukturalne oraz jakie mogą mieć zastosowania. Ponadto, w artykule prezentuję wyniki raportów dotyczących wdrażanie technologii Big Data i omawiam przykładowe technologie związane z Big Data.
Źródło:: Zeszyty Naukowe PWSZ w Płocku. Nauki Ekonomiczne; 2016, 1(23); 311 - 323
1644-888X
Pojawia się w:: Zeszyty Naukowe PWSZ w Płocku. Nauki Ekonomiczne
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: A Survey on Big Data and Internet of Things
Autorzy:: Ragothaman, Bhuvaneswari
Prabha, M. Surya
Jose, Elsa
Sarojini, B.
Powiązania:: https://bibliotekanauki.pl/articles/1193562.pdf
Data publikacji:: 2016
Wydawca:: Przedsiębiorstwo Wydawnictw Naukowych Darwin / Scientific Publishing House DARWIN
Tematy:: Big Data
Cloud
Hadoop
IoT
technologies
Opis:: In the internet the content of data are very large in size and in various structures, which are heterogeneous by nature. This is said to be Big Data. The data which satisfies the conditions of Big Data are similar to the IoT. So it can be said that Big Data and IoT are two sides of the same coin. For IoT the data can be shared using Cloud which acts as the transmission medium, and Big Data is a part of Cloud. Processing and deriving data from IoT is one of the biggest challenge and some special analytics are being used. The extracting and managing of the Big Data generated by the IoT is done by the Big Data technologies like Hadoop. There are various technologies and protocols are being used in the communication between the devices of IoT.
Źródło:: World Scientific News; 2016, 41; 159-164
2392-2192
Pojawia się w:: World Scientific News
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Enhancing approach using hybrid pailler and RSA for information security in bigdata
Autorzy:: Abdalwahid, Shadan Mohammed Jihad
Yousif, Raghad Zuhair
Kareem, Shahab Wahhab
Powiązania:: https://bibliotekanauki.pl/articles/117665.pdf
Data publikacji:: 2019
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: BigData
Hadoop
RSA
Paillier
Cryptography
kryptografia
Opis:: The amount of data processed and stored in the cloud is growing dramatically. The traditional storage devices at both hardware and software levels cannot meet the requirement of the cloud. This fact motivates the need for a platform which can handle this problem. Hadoop is a deployed platform proposed to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data of the cloud system. Hadoop has no strategy to assure the safety and confidentiality of the files saved inside the Hadoop distributed File system (HDFS). In the cloud, the protection of sensitive data is a critical issue in which data encryption schemes plays avital rule. This research proposes a hybrid system between two well-known asymmetric key cryptosystems (RSA, and Paillier) to encrypt the files stored in HDFS. Thus before saving data in HDFS, the proposed cryptosystem is utilized for encrypting the data. Each user of the cloud might upload files in two ways, non-safe or secure. The hybrid system shows higher computational complexity and less latency in comparison to the RSA cryptosystem alone.
Źródło:: Applied Computer Science; 2019, 15, 4; 63-74
1895-3735
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: A Novel Inconsequential Encryption Algorithm for Big Data in Cloud Computing
Autorzy:: Motupalli, Ravi Kanth
Prasad, Krishna K.
Powiązania:: https://bibliotekanauki.pl/articles/2086214.pdf
Data publikacji:: 2022
Wydawca:: Politechnika Lubelska. Instytut Informatyki
Tematy:: Hadoop network
data security
cyber attacks
Salsa20
Opis:: In the digitalized era of the information technology the expansion of the data usage is very high accounting for about enormous data transaction in day to day life. Data from different sources like sensors, mobile phones, satellite, social media and networks, logical transaction and ventures, etc add an gigantic pile to the existing stack of data. One of the best way to handle this exponential data production is the Hadoop network. Thus in the current scenario big industries and organizations rely on the Hadoop network for the production of their essential data. Focusing on the data generation and organization, data security one of the most primary important consideration was left unnoticed making data vulnerable to cyber attacks and hacking. Hence this article proposes an effective mixed algorithm concept with the Salsa20 and AES algorithm to enhance the security of the transaction against unauthorised access and validates the quick data transaction with minimal encryption and decryption time. High throughput obtained in this hybrid framework demonstrates the effectiveness of the proposed algorithmic structure over the existing systems
Źródło:: Journal of Computer Sciences Institute; 2022, 23; 140--144
2544-0764
Pojawia się w:: Journal of Computer Sciences Institute
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: IoT and Big Data towards a Smart City
Autorzy:: Anand, Paul
Powiązania:: https://bibliotekanauki.pl/articles/1193544.pdf
Data publikacji:: 2016
Wydawca:: Przedsiębiorstwo Wydawnictw Naukowych Darwin / Scientific Publishing House DARWIN
Tematy:: Big Data
Hadoop
IoT
Smart City
Smart Systems
Opis:: The fast growth in the population density in urban areas demands more facilities and resources. To meet the needs of city development, the use of Internet of Things (IoT) devices and the smart systems is the very quick and valuable source. However, thousands of IoT devices are interconnecting and communicating with each other over the Internet results in generating a huge amount of data, termed as Big Data. To integrate IoT services and processing Big Data in an efficient way aimed at smart city is a challenging task. Therefore, in this paper, we proposed a system for smart city development based on IoT using Big Data Analytics. We use sensors deployment including smart home sensors, vehicular networking, weather and water sensors, smart parking sensor, and surveillance objects, etc. initially a four-tier architecture is proposed, which includes 1) Bottom Tier: which is responsible for IoT sources, data generations, and collections 2) Intermediate Tier-1: That is responsible for all type of communication between sensors, relays, base stations, the internet, etc. 3) Intermediate Tier 2: it is responsible for data management and processing using Hadoop framework, and 4) Top tier: it is responsible for application and usage of the data analysis and results generated. The collected data from all smart system is processed at real-time to achieve smart cities using Hadoop with Spark, VoltDB, Storm or S4. We use existing datasets by various researchers including smart homes, smart parking weather, pollution, and vehicle for analysis and testing. All the datasets are replayed to test the real-time efficiency of the system. Finally, we evaluated the system by efficiency in term of throughput and processing time. The results show that the proposed system is scalable and efficient.
Źródło:: World Scientific News; 2016, 41; 45-54
2392-2192
Pojawia się w:: World Scientific News
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Apache Hadoop, platforma do gromadzenia, przetwarzania i analizy dużych zbiorów danych
Apache Hadoop, platform for the collection, processing and analysis of large data sets
Autorzy:: Gil, M.
Powiązania:: https://bibliotekanauki.pl/articles/98040.pdf
Data publikacji:: 2017
Wydawca:: Politechnika Lubelska. Instytut Informatyki
Tematy:: Hadoop
big data
analiza danych
analysis of the data
Opis:: W artykule przedstawiono możliwości wykorzystania platformy Hadoop w zarządzaniu wielkimi zbiorami danych. Na podstawie dostępnych źródeł przedstawiono rozwój wydajności aplikacji. Dodatkowo zostały opisane organizacje, które dzięki wdrożeniu tego oprogramowania odniosły sukces w świecie Internetu.
The article presents the possibilities of using Hadoop platform to manage large data sets. The development of application performance has been shown based on available sources. Additionally, the article describes the organizations that have been successful in the Internet thanks to the implemented software.
Źródło:: Journal of Computer Sciences Institute; 2017, 4; 70-75
2544-0764
Pojawia się w:: Journal of Computer Sciences Institute
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Towards Finding Scholarly Articles in Internet Using Hadoop MapReduce with Oozie Workflow
Autorzy:: Jurkiewicz, J.
Nowiński, A.
Powiązania:: https://bibliotekanauki.pl/articles/115951.pdf
Data publikacji:: 2013
Wydawca:: Fundacja na Rzecz Młodych Naukowców
Tematy:: Hadoop
web mining
scientific content finding
web page classification
Opis:: An article focuses on the new methods for automatic processing and analysis of the scientific papers. It covers the very first part of this task – discovery and harvesting of scientific publications from the internet. Article is focused on discovery and analysis of the html documents to identify publication resources. Usage of data from Common Crawl project allows operating on large subset of the web pages without a need to perform an expensive crawl of the WWW. We present methods for automatic identification of pages describing scholarly documents in WWW network using html meta headers. Presented set of rules applied to the data achieves reasonable quality. A system based on these tools is also presented. It allows easy operating and transferring output to the COntent ANalysis SYStem(CoAnSys) - a processing and analysis system developed in ICM. For achieving this goal set of MapReduce tasks running with Hadoop And Ozzie has been used. The quality and efficiency of described rules are discussed. Finally future challenges for our system are presented.
Źródło:: Challenges of Modern Technology; 2013, 4, 4; 3-6
2082-2863
2353-4419
Pojawia się w:: Challenges of Modern Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Decision-making enhancement in a big data environment : application of the K-means algorithm to mixed data
Autorzy:: Koren, Oded
Hallin, Carina Antonia
Perel, Nir
Bendet, Dror
Powiązania:: https://bibliotekanauki.pl/articles/91712.pdf
Data publikacji:: 2019
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: big data
mixed data
hadoop
K-means
decision making
Opis:: Big data research has become an important discipline in information systems research. However, the flood of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts. Thus, research indicates that there is an increasing need to develop more efficient algorithms for treating mixed data in big data for effective decision making. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm that handles the problem of mixed data. We then use big data platforms to implement the algorithm, demonstrating its functionalities by applying the algorithm in a detailed case study. This provides us with a solid basis for performing more targeted profiling for decision making and research using big data. Consequently, the decision makers will be able to treat mixed data, numerical and categorical data, to explain and predict phenomena in the big data ecosystem. Our research includes a detailed end-to-end case study that presents an implementation of the suggested procedure. This demonstrates its capabilities and the advantages that allow it to improve the decision-making process by targeting organizations’ business requirements to a specific cluster[s]/profiles[s] based on the enhancement outcomes.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2019, 9, 4; 293-302
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Massive simulations using MapReduce model
Model MapReduce w wielokrotnych obliczeniach numerycznych
Autorzy:: Krupa, A.
Sawicki, B.
Powiązania:: https://bibliotekanauki.pl/articles/952714.pdf
Data publikacji:: 2015
Wydawca:: Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:: mapreduce
cloud computing
platform performance
hadoop
chmura obliczeniowa
wydajność platformy
Opis:: In the last few years cloud computing is growing as a dominant solution for large scale numerical problems. It is based on MapReduce programming model, which provides high scalability and flexibility, but also optimizes costs of computing infrastructure. This paper studies feasibility of MapReduce model for scientific problems consisting of many independent simulations. Experiment based on variability analysis for simple electromagnetic problem with over 10,000 scenarios proves that platform has nearly linear scalability with over 80% of theoretical maximum performance.
W ostatnich latach chmury obliczeniowe stały się dominującym rozwiązaniem używanym do wielkoskalowych obliczeń numerycznych. Najczęściej są one oparte o programistyczny model MapReduce, który zapewnia wysoką skalowalność, elastyczność, oraz optymalizację kosztów infrastruktury. Artykuł w analityczny sposób przedstawia wykorzystanie MapReduce w rozwiązywaniu problemów naukowych złożonych z wielu niezależnych symulacji. Przeprowadzony eksperyment, złożony z ponad 10 000 przypadków, oparty o analizę zmienności pola elektromagnetycznego pokazuje niemal liniową skalowalność platformy i jej ponad 80% wydajności w stosunku do teoretycznego maksimum.
Źródło:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2015, 4; 45-47
2083-0157
2391-6761
Pojawia się w:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Data locality in Hadoop
Autorzy:: Kałużka, J.
Napieralska, M.
Romero, O.
Jovanovic, P.
Powiązania:: https://bibliotekanauki.pl/articles/397706.pdf
Data publikacji:: 2017
Wydawca:: Politechnika Łódzka. Wydział Mikroelektroniki i Informatyki
Tematy:: distributed file system
big data
Apache Hadoop
HDFS
rozproszony system plików
Opis:: The Apache Hadoop framework is an answer to the market tendencies regarding the need for storing and processing rapidly growing amounts of data, providing a fault-tolerant distributed storage and data processing. Dealing with large volumes of data, Hadoop, and its storage system HDFS (Hadoop Distributed File System), face challenges to keep the high efficiency with computing in a reasonable time. The typical Hadoop implementation transfers computation to the data. However, in the isolated configuration, namenode (playing the role of a master in the cluster) still favours the closer nodes. Basically it means that before the whole task has run, significant delays can be caused by moving single blocks of data closer to the starting datanode. Currently, a Hadoop user does not have influence how the data is distributed across the cluster. This paper presents an innovative functionality to the Hadoop Distributed File System (HDFS) that enables moving data blocks on request within the cluster. Data can be shifted either by a user running the proper HDFS shell command or programmatically by other modules, like an appropriate scheduler.
Źródło:: International Journal of Microelectronics and Computer Science; 2017, 8, 1; 16-20
2080-8755
2353-9607
Pojawia się w:: International Journal of Microelectronics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: An algorithm for vehicle identification by on-board Bluetooth devices exploiting Big-Data tools
Algorytm identyfikacji pojazdów poprzez urządzenia Bluetooth wykorzystujący narzędzia Big Data
Autorzy:: Bazan, M.
Janiczek, T.
Kurda, R.
Matusiak, K.
Sak, Ł.
Powiązania:: https://bibliotekanauki.pl/articles/115307.pdf
Data publikacji:: 2017
Wydawca:: Wyższa Szkoła Techniczna w Katowicach
Tematy:: automatic number plate recognition
Bluetooth devices
HaDoop
Spark
user identification
identyfikacja użytkownika
Opis:: Nowadays, vehicles are equipped with various on-board devices that work in Bluetooth technology and log on to the ITS infrastructure whenever passing by Bluetooth readers. The location of Bluetooth readers is an important issue for travel time prediction in urban areas. Bluetooth technology is used to enhance travel time prediction accuracy and is additional to vehicle license number identification. The algorithms for travel time prediction are used by such technologies e.g., TRAX to offer the road user an alternative route to traverse the most congested regions of the city in the most efficient way. In this paper we present the implementation of the algorithm that enables us to match Bluetooth on-board devices, and also cell phones that are mounted or are just in vehicles of road users. Since the ITS is a source of an enormous and increasing amount of data for this purpose we engage Big Data tools such as Apache HaDoop and Apache Spark. To build Map-Reduce tasks we use Hive-SQL. The algorithm is tested on ITS data from the city of Wroclaw. The results of the algorithm may be used to locate stolen vehicles.
Współczesne pojazdy wyposażane są w wiele różnych urządzeń Bluetooth, które logują się do infrastruktury ITS za każdym razem gdy przejeżdżają one w zasięgu czytników Bluetooth. Położenie czytników Bluetooth jest zagadnieniem istotnym dla metod predykcji czasu przejazdu w regionach zurbanizowanych. Technologia Bluetooth jest użyta do poprawy dokładności czasu przejazdu i jest uzupełnieniem dla identyfikacji pojazdów po numerach rejestracyjnych. Algorytmy do predykcji czasu przejazdu są używane do proponowania użytkownikom trasy alternatywnej w celu przejazdu przez najbardziej zatłoczone regiony miasta w sposób najbardziej efektywny. W artykule jest prezentowana implementacja algorytmu, który pozwala połączyć urządzenia Bluetooth i telefony znajdujące się w pojazdach z samymi pojazdami. Do tego celu angażuje się narzędzia Big Data takie jak Apache HaDoop i Apache Spark. Do zbudowania zadań Map-Reduce używa się Hive-SQLa. Algorytm był testowany na danych z wrocławskiego ITS. Wyniki działania algorytmu mogą być użyte do lokalizowania skradzionych pojazdów.
Źródło:: Zeszyty Naukowe Wyższej Szkoły Technicznej w Katowicach; 2017, 9; 7-21
2082-7016
2450-5552
Pojawia się w:: Zeszyty Naukowe Wyższej Szkoły Technicznej w Katowicach
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Applying e-learning systems for big data education
Autorzy:: Arkit, G.
Robak, S.
Arkit, A.
Powiązania:: https://bibliotekanauki.pl/articles/94807.pdf
Data publikacji:: 2018
Wydawca:: Szkoła Główna Gospodarstwa Wiejskiego w Warszawie. Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie
Tematy:: big data
e-learning platform
Hadoop platform tools
cloud computing
Linux
virtualization
Opis:: Processing massive data amounts and Big Data became nowadays one of the most significant problems in computer science. The difficulties with education on this field arise, the appropriate teaching methods and tools are needed. The processing of vast amounts of data arriving quickly requires the choice and arrangement of extended hardware platforms. In the paper we will show an approach for teaching students in Big Data and also the choice and arrangement of an appropriate programming platform for Big Data laboratories. Usage of an e-learning platform Moodle, a dedicated platform for teaching, could allow the teaching staff and students an improved contact with by enhancing mutually communication possibilities. We will show the preparation of Hadoop platform tools and Big Data cluster based on Cloudera and Ambari. The both solutions together could enable to cope with the problems in education of students in the field of Big Data.
Źródło:: Information Systems in Management; 2018, 7, 2; 85-96
2084-5537
2544-1728
Pojawia się w:: Information Systems in Management
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Big Data – definicje, wyzwania i technologie informatyczne
Big Data − definitions, challenges and information technologies
Autorzy:: Tabakow, Marta
Korczak, Jerzy
Franczyk, Bogdan
Powiązania:: https://bibliotekanauki.pl/articles/432296.pdf
Data publikacji:: 2014
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: Big Data
Big Data definition
challenges of Big Data
Hadoop
NoSql
Map Reduce
parallel processing
Opis:: Big Data as a complex IT issues, is one of the most important challenges of the modern digital world. At the present time, the continuous inflow of a large amount of information from different sources, and thus with different characteristics, requires the introduction of new data analysis techniques and technology. In particular, Big Data requires the use of parallel processing and the departure from the classical scheme of data storage. Thus, in this paper we review the basic issues related to the theme of Big Data: different definitions of „Big Data” research and technological problems and challenges in terms of data volume, their diversity, the reduction of the dimension of data quality and inference capabilities. We also consider the future direction of work in the field of exploration of the possibilities of Big Data in various areas of management.
Źródło:: Informatyka Ekonomiczna; 2014, 1(31); 138-153
1507-3858
Pojawia się w:: Informatyka Ekonomiczna
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: On a book Algorithms for data science by Brian Steele, John Chandler and Swarn Reddy
Autorzy:: Szajowski, Krzysztof J.
Powiązania:: https://bibliotekanauki.pl/articles/747695.pdf
Data publikacji:: 2017
Wydawca:: Polskie Towarzystwo Matematyczne
Tematy:: histogram
algorytm centroidów
Algorithms
Associative Statistics
Computation
Computing Similarity
Cluster Analysis
Correlation
Data Reduction
Data Mapping
Data Dictionary
Data Visualization
Forecasting
Hadoop
Histogram
k-Means Algorithm
k-Nearest Neighbor Prediction
Algorytmy
miary zależności
obliczenia
analiza skupień
korelacja
redukcja danych
transformacja danych
wizualizacja danych
prognozowanie
algorytm k-średnich
algorytm k najbliższych sąsiadów
Opis:: Przedstawiona tutaj pozycja wydawnicza jest obszernym wprowadzeniem do najważniejszych podstawowych zasad, algorytmów i danych wraz zestrukturami, do których te zasady i algorytmy się odnoszą. Przedstawione zaganienia są wstępem do rozważań w dziedzinie informatyki. Jednakże, to algorytmy są podstawą analityki danych i punktem skupienia tego podręcznika. Pozyskiwanie wiedzy z danych wymaga wykorzystania metod i rezultatów z co najmniej trzech dziedzin: matematyki, statystyki i informatyki. Książka zawiera jasne i intuicyjne objaśnienia matematyczne i statystyczne poszczególnych zagadnień, przez co algorytmy są naturalne i przejrzyste. Praktyka analizy danych wymaga jednak więcej niż tylko dobrych podstaw naukowych, ścisłości matematycznej i spojrzenia od strony metodologii statystycznej. Zagadnienia generujące dane są ogromnie zmienne, a dopasowanie metod pozyskiwania wiedzy może być przeprowadzone tylko w najbardziej podstawowych algorytmach. Niezbędna jest płynność programowania i doświadczenie z rzeczywistymi problemami. Czytelnik jest prowadzony przez zagadnienia algorytmiczne z wykorzystaniem Pythona i R na bazie rzeczywistych problemów i analiz danych generowanych przez te zagadnienia. Znaczną część materiału zawartego w książce mogą przyswoić również osoby bez znajomości zaawansowanej metodologii. To powoduje, że książka może być przewodnikiem w jedno lub dwusemestralnym kursie analityki danych dla studentów wyższych lat studiów matematyki, statystyki i informatyki. Ponieważ wymagana wiedza wstępna nie jest zbyt obszerna, studenci po kursie z probabilistyki lub statystyki, ze znajomością podstaw algebry i analizy matematycznej oraz po kurs programowania nie będą mieć problemów, tekst doskonale nadaje się także do samodzielnego studiowania przez absolwentów kierunków ścisłych. Podstawowy materiał jest dobrze ilustrowany obszernymi zagadnieniami zaczerpniętymi z rzeczywistych problemów. Skojarzona z książką strona internetowa wspiera czytelnika danymi wykorzystanymi w książce, a także prezentacją wybranych fragmentów wykładu. Jestem przekonany, że tematem książki jest nowa dziedzina nauki.
The book under review gives a comprehensive presentation of data science algorithms, which means on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. The data science, as the authors claim, is the discipline since 2001. However, informally it worked before that date (cf. Cleveland(2001)). The crucial role had the graphic presentation of the data as the visualization of the knowledge hidden in the data. It is the discipline which covers the data mining as the tool or important topic. The escalating demand for insights into big data requires a fundamentally new approach to architecture, tools, and practices. It is why the term data science is useful. It underscores the centrality of data in the investigation because they store of potential value in the field of action. The label science invokes certain very real concepts within it, like the notion of public knowledge and peer review. This point of view makes that the data science is not a new idea. It is part of a continuum of serious thinking dates back hundreds of years. The good example of results of data science is the Benford law (see Arno Berger and Theodore P. Hill(2015, 2017). In an effort to identifying some of the best-known algorithms that have been widely used in the data mining community, the IEEE International Conference on Data Mining (ICDM) has identified the top 10 algorithms in data mining for presentation at ICDM '06 in Hong Kong. This panel will announce the top 10 algorithms and discuss the impact and further research of each of these 10 algorithms in 2006. In the present book, there are clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. Most of the algorithms announced by IEEE in 2006 are included. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data are indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analysis.
Źródło:: Mathematica Applicanda; 2017, 45, 2
1730-2668
2299-4009
Pojawia się w:: Mathematica Applicanda
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "Hadoop" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język