Temat: data reduction - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: System-level approaches to power efficiency in FPGA-based designs (data reduction algorithms case study)
Autorzy:: Czapski, P. P.
Śluzek, A.
Powiązania:: https://bibliotekanauki.pl/articles/384769.pdf
Data publikacji:: 2011
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: power awareness
FPGA
system-level
Handel-C
data reduction
Opis:: In this paper we present preliminary results on systemlevel analysis of power efficiency in FPGA-based designs. Advanced FPGA devices allow implementation of sophisticated systems (e.g. embedded sensor nodes). However, designing such complex applications is prohibitively expensive at lower levels so that, moving the designing process to higher abstraction layers, i.e. system-levels of design, is a rational decision. This paper shows that at least a certain level of power awareness is achievable at these higher abstractions. A methodology and preliminary results for a power-aware, system-level algorithm partitioning is presented. We select data reduction algorithms as the case study because of their importance in wireless sensor networks (WSN's). Although, the research has been focused on WSN applications of FPGA, it is envisaged that the presented ideas are applicable to other untethered embedded systems based on FPGA's and other similar programmable devices.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2011, 5, 2; 49-59
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Data Reduction Method for Synthetic Transmit Aperture Algorithm
Autorzy:: Karwat, P.
Klimonda, Z.
Sęklewski, M.
Lewandowski, M.
Nowicki, A.
Powiązania:: https://bibliotekanauki.pl/articles/177848.pdf
Data publikacji:: 2010
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: ultrasonic imaging
synthetic transmit aperture
data reduction
effective aperture
reciprocity
Opis:: Ultrasonic methods of human body internal structures imaging are being continuously enhanced. New algorithms are created to improve certain output parameters. A synthetic aperture method (SA) is an example which allows to display images at higher frame-rate than in case of conventional beam-forming method. Higher computational complexity is a limitation of SA method and it can prevent from obtaining a desired reconstruction time. This problem can be solved by neglecting a part of data. Obviously it implies a decrease of imaging quality, however a proper data reduction technique would minimize the image degradation. A proposed way of data reduction can be used with synthetic transmit aperture method (STA) and it bases on an assumption that a signal obtained from any pair of transducers is the same, no matter which transducer transmits and which receives. According to this postulate, nearly a half of the data can be ignored without image quality decrease. The presented results of simulations and measurements with use of wire and tissue phantom prove that the proposed data reduction technique reduces the amount of data to be processed by half, while maintaining resolution and allowing only a small decrease of SNR and contrast of resulting images.
Źródło:: Archives of Acoustics; 2010, 35, 4; 635-642
0137-5075
Pojawia się w:: Archives of Acoustics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Zastosowanie pakietu NVivo w analizie materiałów nieustrukturyzowanych
Computer Aided Qualitative Research Using NVivo in unstructured data analysis
Autorzy:: Brosz, Maciej
Powiązania:: https://bibliotekanauki.pl/articles/1373761.pdf
Data publikacji:: 2012
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Tematy:: CAQDAS
loss of information
data reduction
coding procedures
NVivo
grounded theory
Opis:: This paper concerns using NVivo software in qualitative data analysis. Main subject refers to the data reduction accompanying the process of qualitative data analysis. Using software does not necessarily cause the uncontrolled modifications of data, thereby, the loss of relevant aspects of collected data. The latest version of CAQDAS (i.e., NVivo 8, 9) enables the possibility of coding on barely altered so¬urces. The paper presents examples of coding procedures on texts, pictures, audio-visual recordings. Additionally, the paper includes description of some techniques aiding the coding process.
Źródło:: Przegląd Socjologii Jakościowej; 2012, 8, 1; 98-125
1733-8069
Pojawia się w:: Przegląd Socjologii Jakościowej
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: IoT sensing networks for gait velocity measurement
Autorzy:: Chou, Jyun-Jhe
Shih, Chi-Sheng
Wang, Wei-Dean
Huang, Kuo-Chin
Powiązania:: https://bibliotekanauki.pl/articles/330707.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: internet of things
IoT middleware
data fusion
data reduction
internet rzeczy
oprogramowanie pośredniczące
fuzja danych
redukcja danych
Opis:: Gait velocity has been considered the sixth vital sign. It can be used not only to estimate the survival rate of the elderly, but also to predict the tendency of falling. Unfortunately, gait velocity is usually measured on a specially designed walk path, which has to be done at clinics or health institutes. Wearable tracking services using an accelerometer or an inertial measurement unit can measure the velocity for a certain time interval, but not all the time, due to the lack of a sustainable energy source. To tackle the shortcomings of wearable sensors, this work develops a framework to measure gait velocity using distributed tracking services deployed indoors. Two major challenges are tackled in this paper. The first is to minimize the sensing errors caused by thermal noise and overlapping sensing regions. The second is to minimize the data volume to be stored or transmitted. Given numerous errors caused by remote sensing, the framework takes into account the temporal and spatial relationship among tracking services to calibrate the services systematically. Consequently, gait velocity can be measured without wearable sensors and with higher accuracy. The developed method is built on top of WuKong, which is an intelligent IoT middleware, to enable location and temporal-aware data collection. In this work, we present an iterative method to reduce the data volume collected by thermal sensors. The evaluation results show that the file size is up to 25% of that of the JPEG format when the RMSE is limited to 0.5º.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 2; 245-259
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Python Machine Learning. Dry Beans Classification Case
Autorzy:: Słowiński, Grzegorz
Powiązania:: https://bibliotekanauki.pl/articles/50091919.pdf
Data publikacji:: 2024-09
Wydawca:: Warszawska Wyższa Szkoła Informatyki
Tematy:: machine learning
deep learning
data dimension reduction
activation function
Opis:: A dataset containing over 13k samples of dry beans geometric features was analyzed using machine learning (ML) and deep learning (DL) techniques with the goal to automatically classify the bean species. Performance in terms of accuracy, train and test time was analyzed. First the original dataset was reduced to eliminate redundant features (too strongly correlated and echoing others). Then the dataset was visualized and analyzed with a few shallow learning techniques and simple artificial neural network. Cross validation was used to check the learning process repeatability. Influence of data preparation (dimension reduction) on shallow learning techniques were observed. In case of Multilayer Perceptron 3 activation functions were tried: ReLu, ELU and sigmoid. Random Forest appeared to be the best model for dry beans classification task reaching average accuracy reaching 92.61% with reasonable train and test times.
Źródło:: Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki; 2024, 18, 30; 7-26
1896-396X
2082-8349
Pojawia się w:: Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Analiza czynnikowa zdjęć wielospektralnych
Principal component analysis of multispectral images
Autorzy:: Czapski, P.
Kotlarz, J.
Kubiak, K.
Tkaczyk, M.
Powiązania:: https://bibliotekanauki.pl/articles/213759.pdf
Data publikacji:: 2014
Wydawca:: Sieć Badawcza Łukasiewicz - Instytut Lotnictwa
Tematy:: PCA
metody statystyczne
bioróżnorodność
krzywe blasku
redukcja danych
statistical methods
biodiversity
light curves
data reduction
Opis:: Analiza zdjęć wielospektralnych sprowadza się często do modelowania matematycznego opartego o wielowymiarowe przestrzenie metryczne, w których umieszcza się pozyskane za pomocą sensorów dane. Tego typu bardzo intuicyjne, łatwe do zaaplikowania w algorytmice analizy obrazu postępowanie może skutkować zupełnie niepotrzebnym wzrostem niezbędnej do analiz zdjęć mocy obliczeniowej. Jedną z ogólnie przyjętych grup metod analizy zbiorów danych tego typu są metody analizy czynnikowej. Wpracy tej prezentujemy dwie z nich: Principal Component Analysis (PCA) oraz Simplex Shrink-Wrapping (SSW). Użyte jednocześnie obniżają znacząco wymiar zadanej przestrzeni metrycznej pozwalając odnaleźć w danych wielospektralnych charakterystyczne składowe, czyli przeprowadzić cały proces detekcji fotografowanych obiektów. W roku 2014 w Pracowni Przetwarzania Danych Instytutu Lotnictwa oraz Zakładzie Ochrony Lasu Instytutu Badawczego Leśnictwa metodykę tą równie skutecznie przyjęto dla analizy dwóch niezwykle różnych serii zdjęć wielospektralnych: detekcji głównych składowych powierzchni Marsa (na podstawie zdjęć wielospektralnych pozyskanych w ramach misji EPOXI, NASA) oraz oszacowania bioróżnorodności jednej z leśnych powierzchni badawczych projektu HESOFF.
Mostly, analysis of multispectral images employs mathematical modeling based on multidimensional metric spaces that includes collected by the sensors data. Such an intuitive approach easily applicable to image analysis applications can result in unnecessary computing power increase required by this analysis. One of the groups of generally accepted methods of analysis of data sets are factor analysis methods. Two such factor analysis methods are presented in this paper, i.e. Principal Component Analysis (PCA ) and Simplex Shrink - Wrapping (SSW). If they are used together dimensions of a metric space can be reduced significantly allowing characteristic components to be found in multispectral data, i.e. to carry out the whole detection process of investigated images. In 2014 such methodology was adopted by Data Processing Department of the Institute of Aviation and Division of Forest Protection of Forest Research Institute for the analysis of the two very different series of multispectral images: detection of major components of the Mars surface (based on multispectral images obtained from the epoxy mission, NASA) and biodiversity estimation of one of the investigated in the HESOFF project forest complexes.
Źródło:: Prace Instytutu Lotnictwa; 2014, 1 (234) March 2014; 143-150
0509-6669
2300-5408
Pojawia się w:: Prace Instytutu Lotnictwa
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem
Autorzy:: Czarnowski, I.
Jędrzejowicz, P.
Powiązania:: https://bibliotekanauki.pl/articles/907819.pdf
Data publikacji:: 2011
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: redukcja danych
komputerowe uczenie się
optymalizacja
system wieloagentowy
data reduction
machine learning
A-Team
optimization
multi-agent system
Opis:: The problem considered concerns data reduction for machine learning. Data reduction aims at deciding which features and instances from the training set should be retained for further use during the learning process. Data reduction results in increased capabilities and generalization properties of the learning model and a shorter time of the learning process. It can also help in scaling up to large data sources. The paper proposes an agent-based data reduction approach with the learning process executed by a team of agents (A-Team). Several A-Team architectures with agents executing the simulated annealing and tabu search procedures are proposed and investigated. The paper includes a detailed description of the proposed approach and discusses the results of a validating experiment.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2011, 21, 1; 57-68
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Efficient astronomical data condensation using approximate nearest neighbors
Autorzy:: Łukasik, Szymon
Lalik, Konrad
Sarna, Piotr
Kowalski, Piotr A.
Charytanowicz, Małgorzata
Kulczycki, Piotr
Powiązania:: https://bibliotekanauki.pl/articles/907932.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: big data
astronomical observation
data reduction
nearest neighbor search
kd-trees
duży zbiór danych
obserwacja astronomiczna
redukcja danych
wyszukiwanie najbliższego sąsiada
drzewo kd
Opis:: Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 3; 467-476
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Optimization on the complementation procedure towards efficient implementation of the index generation function
Autorzy:: Borowik, G.
Powiązania:: https://bibliotekanauki.pl/articles/330597.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: data reduction
feature selection
indiscernibility matrix
logic synthesis
index generation function
redukcja danych
selekcja cech
synteza logiczna
funkcja generowania indeksów
Opis:: In the era of big data, solutions are desired that would be capable of efficient data reduction. This paper presents a summary of research on an algorithm for complementation of a Boolean function which is fundamental for logic synthesis and data mining. Successively, the existing problems and their proposed solutions are examined, including the analysis of current implementations of the algorithm. Then, methods to speed up the computation process and efficient parallel implementation of the algorithm are shown; they include optimization of data representation, recursive decomposition, merging, and removal of redundant data. Besides the discussion of computational complexity, the paper compares the processing times of the proposed solution with those for the well-known analysis and data mining systems. Although the presented idea is focused on searching for all possible solutions, it can be restricted to finding just those of the smallest size. Both approaches are of great application potential, including proving mathematical theorems, logic synthesis, especially index generation functions, or data processing and mining such as feature selection, data discretization, rule generation, etc. The problem considered is NP-hard, and it is easy to point to examples that are not solvable within the expected amount of time. However, the solution allows the barrier of computations to be moved one step further. For example, the unique algorithm can calculate, as the only one at the moment, all minimal sets of features for few standard benchmarks. Unlike many existing methods, the algorithm additionally works with undetermined values. The result of this research is an easily extendable experimental software that is the fastest among the tested solutions and the data mining systems.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 803-815
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: An effective data reduction model for machine emergency state detection from big data tree topology structures
Autorzy:: Iaremko, Iaroslav
Senkerik, Roman
Jasek, Roman
Lukastik, Petr
Powiązania:: https://bibliotekanauki.pl/articles/2055178.pdf
Data publikacji:: 2021
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: OPC UA
OPC tree
principal component analysis
PCA
big data analysis
data reduction
machine tool
anomaly detection
emergency states
analiza głównych składowych
duży zbiór danych
redukcja danych
wykrywanie anomalii
stan nadzwyczajny
Opis:: This work presents an original model for detecting machine tool anomalies and emergency states through operation data processing. The paper is focused on an elastic hierarchical system for effective data reduction and classification, which encompasses several modules. Firstly, principal component analysis (PCA) is used to perform data reduction of many input signals from big data tree topology structures into two signals representing all of them. Then the technique for segmentation of operating machine data based on dynamic time distortion and hierarchical clustering is used to calculate signal accident characteristics using classifiers such as the maximum level change, a signal trend, the variance of residuals, and others. Data segmentation and analysis techniques enable effective and robust detection of operating machine tool anomalies and emergency states due to almost real-time data collection from strategically placed sensors and results collected from previous production cycles. The emergency state detection model described in this paper could be beneficial for improving the production process, increasing production efficiency by detecting and minimizing machine tool error conditions, as well as improving product quality and overall equipment productivity. The proposed model was tested on H-630 and H-50 machine tools in a real production environment of the Tajmac-ZPS company.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2021, 31, 4; 601--611
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Porównanie algorytmów ekstrakcji punktów istotnych w upraszczaniu numerycznych modeli terenu o strukturze hybrydowej
The comparison of algorithms for key points extraction in simplification of hybrid digital terrain models
Autorzy:: Bakuła, K.
Powiązania:: https://bibliotekanauki.pl/articles/131096.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Geodetów Polskich
Tematy:: numeryczny model terenu
punkty istotne
redukcja ilościowa danych
generalizacja
indeks pozycji topograficznej
TPI
Very Important Points
VIP
Z-tolerance
digital terrain model (DTM)
critical points
data reduction
generalization
Topographic Position Index
Opis:: Przedstawione badania dotyczą opracowania algorytmu redukcji ilości danych wysokościowych w postaci numerycznego modelu terenu z lotniczego skanowania laserowego (ALS) dla potrzeb modelowania powodziowego. Redukcja jest procesem niezbędnym w przetwarzaniu ogromnych zbiorów danych z ALS, a jej przebieg nie może mieć charakteru regularnej filtracji danych, co często ma miejsce w praktyce. Działanie takie prowadzi do pominięcia szeregu istotnych form terenowych z punktu widzenia modelowania hydraulicznego. Jednym z proponowanych rozwiązań dla redukcji danych wysokościowych zawartych w numerycznych modelach terenu jest zmiana jego struktury z regularnej siatki na strukturę hybrydową z regularnie rozmieszczonymi punktami oraz nieregularnie rozlokowanymi punktami istotnymi. Celem niniejszego artykułu jest porównanie algorytmów ekstrakcji punktów istotnych z numerycznych modeli terenu, które po przetworzeniu ich z użyciem redukcji danych zachowają swoją dokładność przy jednoczesnym zmniejszeniu rozmiaru plików wynikowych. W doświadczeniach zastosowano algorytmy: indeksu pozycji topograficznej (TPI), Very Important Points (VIP) oraz Z-tolerance, które posłużyły do stworzenia numerycznych modeli terenu, podlegających następnie ocenie w porównaniu z danymi wejściowymi. Analiza taka pozwoliła na porównanie metod. Wyniki badań potwierdzają możliwości uzyskania wysokiego stopnia redukcji, która wykorzystuje jedynie kilka procent danych wejściowych, przy relatywnie niewielkim spadku dokładności pionowej modelu terenu sięgającego kilku centymetrów.
The presented research concerns methods related to reduction of elevation data contained in digital terrain model (DTM) from airborne laser scanning (ALS) in hydraulic modelling. The reduction is necessary in the preparation of large datasets of geospatial data describing terrain relief. Its course should not be associated with regular data filtering, which often occurs in practice. Such a method leads to a number of important forms important for hydraulic modeling being missed. One of the proposed solutions for the reduction of elevation data contained in DTM is to change the regular grid into the hybrid structure with regularly distributed points and irregularly located critical points. The purpose of this paper is to compare algorithms for extracting these key points from DTM. They are used in hybrid model generation as a part of elevation data reduction process that retains DTM accuracy and reduces the size of output files. In experiments, the following algorithms were tested: Topographic Position Index (TPI), Very Important Points (VIP) and Z-tolerance. Their effectiveness in reduction (maintaining the accuracy and reducing datasets) was evaluated in respect to input DTM from ALS. The best results were obtained for the Z-tolerance algorithm, but they do not diminish the capabilities of the other two algorithms: VIP and TPI which can generalize DTM quite well. The results confirm the possibility of obtaining a high degree of reduction reaching only a few percent of the input data with a relatively low decrease of vertical DTM accuracy to a few centimetres. The presented paper was financed by the Foundation for Polish Science - research grant no. VENTURES/2012-9/1 from Innovative Economy program of the European Structural Funds.
Źródło:: Archiwum Fotogrametrii, Kartografii i Teledetekcji; 2014, 26; 11-21
2083-2214
2391-9477
Pojawia się w:: Archiwum Fotogrametrii, Kartografii i Teledetekcji
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: On a book Algorithms for data science by Brian Steele, John Chandler and Swarn Reddy
Autorzy:: Szajowski, Krzysztof J.
Powiązania:: https://bibliotekanauki.pl/articles/747695.pdf
Data publikacji:: 2017
Wydawca:: Polskie Towarzystwo Matematyczne
Tematy:: histogram
algorytm centroidów
Algorithms
Associative Statistics
Computation
Computing Similarity
Cluster Analysis
Correlation
Data Reduction
Data Mapping
Data Dictionary
Data Visualization
Forecasting
Hadoop
Histogram
k-Means Algorithm
k-Nearest Neighbor Prediction
Algorytmy
miary zależności
obliczenia
analiza skupień
korelacja
redukcja danych
transformacja danych
wizualizacja danych
prognozowanie
algorytm k-średnich
algorytm k najbliższych sąsiadów
Opis:: Przedstawiona tutaj pozycja wydawnicza jest obszernym wprowadzeniem do najważniejszych podstawowych zasad, algorytmów i danych wraz zestrukturami, do których te zasady i algorytmy się odnoszą. Przedstawione zaganienia są wstępem do rozważań w dziedzinie informatyki. Jednakże, to algorytmy są podstawą analityki danych i punktem skupienia tego podręcznika. Pozyskiwanie wiedzy z danych wymaga wykorzystania metod i rezultatów z co najmniej trzech dziedzin: matematyki, statystyki i informatyki. Książka zawiera jasne i intuicyjne objaśnienia matematyczne i statystyczne poszczególnych zagadnień, przez co algorytmy są naturalne i przejrzyste. Praktyka analizy danych wymaga jednak więcej niż tylko dobrych podstaw naukowych, ścisłości matematycznej i spojrzenia od strony metodologii statystycznej. Zagadnienia generujące dane są ogromnie zmienne, a dopasowanie metod pozyskiwania wiedzy może być przeprowadzone tylko w najbardziej podstawowych algorytmach. Niezbędna jest płynność programowania i doświadczenie z rzeczywistymi problemami. Czytelnik jest prowadzony przez zagadnienia algorytmiczne z wykorzystaniem Pythona i R na bazie rzeczywistych problemów i analiz danych generowanych przez te zagadnienia. Znaczną część materiału zawartego w książce mogą przyswoić również osoby bez znajomości zaawansowanej metodologii. To powoduje, że książka może być przewodnikiem w jedno lub dwusemestralnym kursie analityki danych dla studentów wyższych lat studiów matematyki, statystyki i informatyki. Ponieważ wymagana wiedza wstępna nie jest zbyt obszerna, studenci po kursie z probabilistyki lub statystyki, ze znajomością podstaw algebry i analizy matematycznej oraz po kurs programowania nie będą mieć problemów, tekst doskonale nadaje się także do samodzielnego studiowania przez absolwentów kierunków ścisłych. Podstawowy materiał jest dobrze ilustrowany obszernymi zagadnieniami zaczerpniętymi z rzeczywistych problemów. Skojarzona z książką strona internetowa wspiera czytelnika danymi wykorzystanymi w książce, a także prezentacją wybranych fragmentów wykładu. Jestem przekonany, że tematem książki jest nowa dziedzina nauki.
The book under review gives a comprehensive presentation of data science algorithms, which means on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. The data science, as the authors claim, is the discipline since 2001. However, informally it worked before that date (cf. Cleveland(2001)). The crucial role had the graphic presentation of the data as the visualization of the knowledge hidden in the data. It is the discipline which covers the data mining as the tool or important topic. The escalating demand for insights into big data requires a fundamentally new approach to architecture, tools, and practices. It is why the term data science is useful. It underscores the centrality of data in the investigation because they store of potential value in the field of action. The label science invokes certain very real concepts within it, like the notion of public knowledge and peer review. This point of view makes that the data science is not a new idea. It is part of a continuum of serious thinking dates back hundreds of years. The good example of results of data science is the Benford law (see Arno Berger and Theodore P. Hill(2015, 2017). In an effort to identifying some of the best-known algorithms that have been widely used in the data mining community, the IEEE International Conference on Data Mining (ICDM) has identified the top 10 algorithms in data mining for presentation at ICDM '06 in Hong Kong. This panel will announce the top 10 algorithms and discuss the impact and further research of each of these 10 algorithms in 2006. In the present book, there are clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. Most of the algorithms announced by IEEE in 2006 are included. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data are indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analysis.
Źródło:: Mathematica Applicanda; 2017, 45, 2
1730-2668
2299-4009
Pojawia się w:: Mathematica Applicanda
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Efektywne wykorzystanie danych lidar w dwuwymiarowym modelowaniu hydraulicznym
Effective application of lidar data in two-dimensional hydraulic modelling
Autorzy:: Bakuła, K.
Powiązania:: https://bibliotekanauki.pl/articles/129777.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Geodetów Polskich
Tematy:: modelowanie hydrauliczne
lotnicze skanowanie laserowe
ALS
numeryczny model terenu
NMT
redukcja ilościowa danych
numeryczny model szorstkości podoża
hydraulic modeling
airborne laser scanning (ALS)
digital terrain model (DTM)
DTM
data reduction
digital roughness model
DRM
Opis:: W niniejszym artykule zaprezentowano aspekty wykorzystania danych z lotniczego skanowania laserowego w dwuwymiarowym modelowaniu hydraulicznym obejmujących tworzenie wysokiej dokładności numerycznych modeli terenu, ich efektywnego przetworzenia będącego kompromisem pomiędzy rozdzielczością danych a ich dokładnością, a także uzyskania informacji o szorstkości pokrycia terenu dającego informacje mogące konkurować z informacjami z baz topograficznych. Prezentowane badania dotyczyły przetwarzania danych NMT w celu ich efektywnego wykorzystania z udziałem metod redukcji ilościowej danych. W wyniku oddzielnego eksperymentu udowodniono, iż dane z lotniczego skanowania laserowego umożliwiają utworzenie w pełni poprawnego dwuwymiarowego modelu hydraulicznego przy założeniu posiadania odpowiednich danych hydraulicznych, a dane z ALS mogą mieć inne zastosowanie niż tylko geometryczna reprezentacja rzeźby terenu.
This paper presents aspects of ALS data usage in two-dimensional hydraulic modelling including generation of high-precision digital terrain models, their effective processing which is a compromise between the resolution and the accuracy of the processed data, as well as information about the roughness of the land cover providing information that could compete with information from topographic databases and orthophotomaps. Still evolving ALS technology makes it possible to collect the data with constantly increasing spatial resolution that guarantees correct representation of the terrain shape and height. It also provides a reliable description of the land cover. However, the size of generated files may cause problems in their effective usage in the 2D hydraulic modeling where Saint-Venant’s equations are implemented. High-resolution elevation models make it impossible or prolong the duration of the calculations for large areas in complex algorithms defining a model of the water movement, which is directly related to the cost of the hydraulic analysis. As far as an effective usage of voluminous datasets is concerned, the data reduction is recommended. Such a process should reduce the size of the data files, maintain their accuracy and keep the appropriate structure to allow their further application in the hydraulic modelling. An application of only a few percent of unprocessed datasets, selected with the use of specified filtering algorithms and spatial analysis tools, can give the same result of the hydraulic modeling obtained in a significantly shorter time than the result of the comparable operation on unprocessed datasets. Such an approach, however, is not commonly used, which means the most reliable hydraulic models are applied only in small areas in the largest cities. Another application of ALS data is its potential usage in digital roughness model creation for 2D hydraulic models. There are many possibilities of roughness coefficient estimation in hydraulic modelling which has an impact on the velocity of water flow. As a basic and reference source for such analysis topographic databases as well as orthophotomaps from aerial or satellite imagery can be considered. The presented paper proved that LIDAR data should be effectively applied in cooperation between surveyors and hydrologists. ALS data can be s used solely in the creation of a fully correct two-dimensional hydraulic model, assuming that appropriate hydraulic datasets are available. Additionally, application of ALS data should not be limited to geometric representation of the terrain and it can be used as information about roughness of terrain. The presented paper was financed by the Foundation for Polish Science - research grant no. VENTURES/2012-9/1 from Innovative Economy program of the European Structural Funds.
Źródło:: Archiwum Fotogrametrii, Kartografii i Teledetekcji; 2014, 26; 23-37
2083-2214
2391-9477
Pojawia się w:: Archiwum Fotogrametrii, Kartografii i Teledetekcji
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Novel approach for big data classification based on hybrid parallel dimensionality reduction using spark cluster
Autorzy:: Ali, Ahmed Hussein
Abdullah, Mahmood Zaki
Powiązania:: https://bibliotekanauki.pl/articles/305766.pdf
Data publikacji:: 2019
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: big data
dimensionality reduction
parallel processing
Spark
PCA
LDA
Opis:: The big data concept has elicited studies on how to accurately and efficiently extract valuable information from such huge dataset. The major problem during big data mining is data dimensionality due to a large number of dimensions in such datasets. This major consequence of high data dimensionality is that it affects the accuracy of machine learning (ML) classifiers; it also results in time wastage due to the presence of several redundant features in the dataset. This problem can be possibly solved using a fast feature reduction method. Hence, this study presents a fast HP-PL which is a new hybrid parallel feature reduction framework that utilizes spark to facilitate feature reduction on shared/distributed-memory clusters. The evaluation of the proposed HP-PL on KDD99 dataset showed the algorithm to be significantly faster than the conventional feature reduction techniques. The proposed technique required >1 minute to select 4 dataset features from over 79 features and 3,000,000 samples on a 3-node cluster (total of 21 cores). For the comparative algorithm, more than 2 hours was required to achieve the same feat. In the proposed system, Hadoop’s distributed file system (HDFS) was used to achieve distributed storage while Apache Spark was used as the computing engine. The model development was based on a parallel model with full consideration of the high performance and throughput of distributed computing. Conclusively, the proposed HP-PL method can achieve good accuracy with less memory and time compared to the conventional methods of feature reduction. This tool can be publicly accessed at https://github.com/ahmed/Fast-HP-PL.
Źródło:: Computer Science; 2019, 20 (4); 411-429
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Multilinear Filtering Based on a Hierarchical Structure of Covariance Matrices
Autorzy:: Szwabe, Andrzej
Ciesielczyk, Michal
Misiorek, Pawel
Powiązania:: https://bibliotekanauki.pl/articles/1373696.pdf
Data publikacji:: 2015
Wydawca:: Uniwersytet Jagielloński. Wydawnictwo Uniwersytetu Jagiellońskiego
Tematy:: tensor-based data modeling
multilinear PCA
random indexing
dimensionality reduction
multilinear data filtering
higher-order SVD
Opis:: We propose a novel model of multilinear filtering based on a hierarchical structure of covariance matrices – each matrix being extracted from the input tensor in accordance to a specific set-theoretic model of data generalization, such as derivation of expectation values. The experimental analysis results presented in this paper confirm that the investigated approaches to tensor-based data representation and processing outperform the standard collaborative filtering approach in the ‘cold-start’ personalized recommendation scenario (of very sparse input data). Furthermore, it has been shown that the proposed method is superior to standard tensor-based frameworks such as N-way Random Indexing (NRI) and Higher-Order Singular Value Decomposition (HOSVD) in terms of both the AUROC measure and computation time.
Źródło:: Schedae Informaticae; 2015, 24; 103-112
0860-0295
2083-8476
Pojawia się w:: Schedae Informaticae
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "data reduction" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język