Temat: redukcja danych - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Rough Sets Methods in Feature Reduction and Classification
Autorzy:: Świniarski, R. W.
Powiązania:: https://bibliotekanauki.pl/articles/908366.pdf
Data publikacji:: 2001
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: rozpoznawanie obrazów
redukcja danych
rough sets
feature selection
classification
Opis:: The paper presents an application of rough sets and statistical methods to feature reduction and pattern recognition. The presented description of rough sets theory emphasizes the role of rough sets reducts in feature selection and data reduction in pattern recognition. The overview of methods of feature selection emphasizes feature selection criteria, including rough set-based methods. The paper also contains a description of the algorithm for feature selection and reduction based on the rough sets method proposed jointly with Principal Component Analysis. Finally, the paper presents numerical results of face recognition experiments using the learning vector quantization neural network, with feature selection based on the proposed principal components analysis and rough sets methods.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2001, 11, 3; 565-582
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Redukcja danych w diagnostycznych bazach danych
Data reduction in diagnostic databases
Autorzy:: Chrzanowski, P.
Powiązania:: https://bibliotekanauki.pl/articles/327622.pdf
Data publikacji:: 2003
Wydawca:: Polska Akademia Nauk. Polskie Towarzystwo Diagnostyki Technicznej PAN
Tematy:: systemy doradcze
systemy monitorowania
diagnostyczne bazy danych
redukcja danych
expert systems
monitoring system
diagnostic databases
Opis:: W artykule przedstawiono problematykę związaną ze wspomaganiem procesu wnioskowania o stanie obiektu rzeczywistego. Głównym zagadnieniem jest redukcja olbrzymiej ilości danych dostarczanych do systemu monitorowania. Wyróżniono trzy grupy metod: ograniczania liczby rozpatrywanych cech, ograniczania w zbiorze rozróżnianych wartości oraz ograniczania liczby elementów wykresu wartości. W wyniku przeprowadzonej analizy danych zaproponowano metodę kwantowania z histerezą w celu redukcji liczby rozróżnianych wartości. Ponadto zaproponowano metodę optymalizacji szerokości pasma kwantowania z histerezą z użyciem testu statystycznego.
The article present problems about computer aided machinery state reasoning. The main task of this issue was reduction of huge quantity data sets provides to monitoring system. The methods can be divided into methods for reduction of features, methods for reduction of measured data sets and time-domain methods. On the basis of data analysis, was proposed data set reduction by sampling with hysteresis to reduction of measured data sets. Moreover the method for a tolerable degree fit was proposed and analyzed. The method is based upon statistical analysis.
Źródło:: Diagnostyka; 2003, 29; 41-46
1641-6414
2449-5220
Pojawia się w:: Diagnostyka
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: IoT sensing networks for gait velocity measurement
Autorzy:: Chou, Jyun-Jhe
Shih, Chi-Sheng
Wang, Wei-Dean
Huang, Kuo-Chin
Powiązania:: https://bibliotekanauki.pl/articles/330707.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: internet of things
IoT middleware
data fusion
data reduction
internet rzeczy
oprogramowanie pośredniczące
fuzja danych
redukcja danych
Opis:: Gait velocity has been considered the sixth vital sign. It can be used not only to estimate the survival rate of the elderly, but also to predict the tendency of falling. Unfortunately, gait velocity is usually measured on a specially designed walk path, which has to be done at clinics or health institutes. Wearable tracking services using an accelerometer or an inertial measurement unit can measure the velocity for a certain time interval, but not all the time, due to the lack of a sustainable energy source. To tackle the shortcomings of wearable sensors, this work develops a framework to measure gait velocity using distributed tracking services deployed indoors. Two major challenges are tackled in this paper. The first is to minimize the sensing errors caused by thermal noise and overlapping sensing regions. The second is to minimize the data volume to be stored or transmitted. Given numerous errors caused by remote sensing, the framework takes into account the temporal and spatial relationship among tracking services to calibrate the services systematically. Consequently, gait velocity can be measured without wearable sensors and with higher accuracy. The developed method is built on top of WuKong, which is an intelligent IoT middleware, to enable location and temporal-aware data collection. In this work, we present an iterative method to reduce the data volume collected by thermal sensors. The evaluation results show that the file size is up to 25% of that of the JPEG format when the RMSE is limited to 0.5º.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 2; 245-259
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Analiza czynnikowa zdjęć wielospektralnych
Principal component analysis of multispectral images
Autorzy:: Czapski, P.
Kotlarz, J.
Kubiak, K.
Tkaczyk, M.
Powiązania:: https://bibliotekanauki.pl/articles/213759.pdf
Data publikacji:: 2014
Wydawca:: Sieć Badawcza Łukasiewicz - Instytut Lotnictwa
Tematy:: PCA
metody statystyczne
bioróżnorodność
krzywe blasku
redukcja danych
statistical methods
biodiversity
light curves
data reduction
Opis:: Analiza zdjęć wielospektralnych sprowadza się często do modelowania matematycznego opartego o wielowymiarowe przestrzenie metryczne, w których umieszcza się pozyskane za pomocą sensorów dane. Tego typu bardzo intuicyjne, łatwe do zaaplikowania w algorytmice analizy obrazu postępowanie może skutkować zupełnie niepotrzebnym wzrostem niezbędnej do analiz zdjęć mocy obliczeniowej. Jedną z ogólnie przyjętych grup metod analizy zbiorów danych tego typu są metody analizy czynnikowej. Wpracy tej prezentujemy dwie z nich: Principal Component Analysis (PCA) oraz Simplex Shrink-Wrapping (SSW). Użyte jednocześnie obniżają znacząco wymiar zadanej przestrzeni metrycznej pozwalając odnaleźć w danych wielospektralnych charakterystyczne składowe, czyli przeprowadzić cały proces detekcji fotografowanych obiektów. W roku 2014 w Pracowni Przetwarzania Danych Instytutu Lotnictwa oraz Zakładzie Ochrony Lasu Instytutu Badawczego Leśnictwa metodykę tą równie skutecznie przyjęto dla analizy dwóch niezwykle różnych serii zdjęć wielospektralnych: detekcji głównych składowych powierzchni Marsa (na podstawie zdjęć wielospektralnych pozyskanych w ramach misji EPOXI, NASA) oraz oszacowania bioróżnorodności jednej z leśnych powierzchni badawczych projektu HESOFF.
Mostly, analysis of multispectral images employs mathematical modeling based on multidimensional metric spaces that includes collected by the sensors data. Such an intuitive approach easily applicable to image analysis applications can result in unnecessary computing power increase required by this analysis. One of the groups of generally accepted methods of analysis of data sets are factor analysis methods. Two such factor analysis methods are presented in this paper, i.e. Principal Component Analysis (PCA ) and Simplex Shrink - Wrapping (SSW). If they are used together dimensions of a metric space can be reduced significantly allowing characteristic components to be found in multispectral data, i.e. to carry out the whole detection process of investigated images. In 2014 such methodology was adopted by Data Processing Department of the Institute of Aviation and Division of Forest Protection of Forest Research Institute for the analysis of the two very different series of multispectral images: detection of major components of the Mars surface (based on multispectral images obtained from the epoxy mission, NASA) and biodiversity estimation of one of the investigated in the HESOFF project forest complexes.
Źródło:: Prace Instytutu Lotnictwa; 2014, 1 (234) March 2014; 143-150
0509-6669
2300-5408
Pojawia się w:: Prace Instytutu Lotnictwa
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Zastosowanie narzędzi laboratorium tekstowego programu MAXQDA na przykładzie analizy porównawczej polskich i zagranicznych raportów badawczych
Application of the "textual laboratory" tools of the MaxQda computer program in scientific research - comparative analysis of Polish and foreign research reports
Autorzy:: Sajdera, Jolanta
Powiązania:: https://bibliotekanauki.pl/articles/2186160.pdf
Data publikacji:: 2023-03-31
Wydawca:: Szkoła Główna Handlowa w Warszawie
Tematy:: metody badań jakościowych
komputerowo wspomagana analiza danych jakościowych
MAXQDA
redukcja danych
kodowanie danych
qualitative methods
computer assisted qualitative data analysis software
coding process
Opis:: Digitalizacja danych tekstowych jest obecnie powszechna w wielu dziedzinach życia społecznego, a rozwój technik cyfrowych sprzyja powstawaniu nowych narzędzi wykorzystywanych przez naukowców. Przedmiotem refleksji jest zastosowanie programów z rodziny CAQDAS (Computer-Assisted Qualitative Data Analysis) do komputerowo wspomaganej analizy danych w procedurze badawczej. Celem rozważań podjętych w artykule jest krytyczna analiza możliwości wykorzystania programu MAXQDA VERBI Software w analizowaniu danych tekstowych z perspektywy użytkownika. W pierwszej części artykułu przywołano kluczowe etapy rozwoju narzędzi cyfrowych, którymi mogą posługiwać się badacze podczas analizy danych tekstowych. Następnie wyjaśniono pojęcie laboratorium tekstowe, oraz, na przykładzie własnej praktyki analitycznej, opisano sposób pracy z programem MAXQDA podczas redukcji i reprezentacji danych. W kolejnej części zaprezentowano wyniki analizy porównawczej źródeł wtórnych, w której wykorzystano dane pochodzące z trzech źródeł: rankingu cytowań artykułów naukowych Herzing’s Publish or Perish oraz wielodziedzinowego konsorcjum baz ProQuest i platformy Biblioteka Nauki. Korpus danych stanowiły raporty, w których autorzy deklarowali użycie programu MAXQDA. Analiza pozwoliła wskazać narzędzia programu wykorzystywane przez autorów raportów oraz dziedziny nauki, które reprezentują badacze. W podsumowaniu podjęto refleksję nad udziałem nowoczesnych technologii w rozwoju analizy danych jakościowych oraz przedstawiono przykłady dylematów, które mogą stać się udziałem badacza podczas kolejnych etapów komputerowo wspomaganej pracy analitycznej.
Digitization of textual data is now commonplace in many areas of society, and the development of digital techniques is fostering the emergence of new tools used by researchers. The article addresses the issue of the application of computer-assisted analysis of textual data in the procedure of qualitative research from the perspective of its user. Due to the wide scope of the issue being addressed, the question being considered is the applicability of working with a selected program from the CAQDAS family - VERBI Software's MaxQda program. The first part of the article recalls milestones in the development of digital tools that researchers can use to analyse textual data. Next, the metaphor of a "text lab" is explained, describing how to work with the MaxQda program during data reduction and representation. The next section presents the results of a comparative analysis of secondary sources, using data from three sources: Herzing's Publish or Perish citation ranking, ProQuest databases, and the Biblioteka Nauki platform. The corpus of data consisted of research reports in which the authors declared that they used the MaxQda program. The analysis made it possible to identify the program's capabilities on which the authors of the reports focus, as well as the scientific fields that the authors represent. The conclusion reflects on the contribution of modern technology to the development of qualitative data analysis, and provides examples of the dilemmas that a researcher may face during the next stages of computer-assisted analytical work, using their own analytical practice as an example.
Źródło:: e-mentor. Czasopismo naukowe Szkoły Głównej Handlowej w Warszawie; 2023, 98, 1; 42-51
1731-6758
1731-7428
Pojawia się w:: e-mentor. Czasopismo naukowe Szkoły Głównej Handlowej w Warszawie
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem
Autorzy:: Czarnowski, I.
Jędrzejowicz, P.
Powiązania:: https://bibliotekanauki.pl/articles/907819.pdf
Data publikacji:: 2011
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: redukcja danych
komputerowe uczenie się
optymalizacja
system wieloagentowy
data reduction
machine learning
A-Team
optimization
multi-agent system
Opis:: The problem considered concerns data reduction for machine learning. Data reduction aims at deciding which features and instances from the training set should be retained for further use during the learning process. Data reduction results in increased capabilities and generalization properties of the learning model and a shorter time of the learning process. It can also help in scaling up to large data sources. The paper proposes an agent-based data reduction approach with the learning process executed by a team of agents (A-Team). Several A-Team architectures with agents executing the simulated annealing and tabu search procedures are proposed and investigated. The paper includes a detailed description of the proposed approach and discusses the results of a validating experiment.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2011, 21, 1; 57-68
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Efficient astronomical data condensation using approximate nearest neighbors
Autorzy:: Łukasik, Szymon
Lalik, Konrad
Sarna, Piotr
Kowalski, Piotr A.
Charytanowicz, Małgorzata
Kulczycki, Piotr
Powiązania:: https://bibliotekanauki.pl/articles/907932.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: big data
astronomical observation
data reduction
nearest neighbor search
kd-trees
duży zbiór danych
obserwacja astronomiczna
redukcja danych
wyszukiwanie najbliższego sąsiada
drzewo kd
Opis:: Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 3; 467-476
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Optimization on the complementation procedure towards efficient implementation of the index generation function
Autorzy:: Borowik, G.
Powiązania:: https://bibliotekanauki.pl/articles/330597.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: data reduction
feature selection
indiscernibility matrix
logic synthesis
index generation function
redukcja danych
selekcja cech
synteza logiczna
funkcja generowania indeksów
Opis:: In the era of big data, solutions are desired that would be capable of efficient data reduction. This paper presents a summary of research on an algorithm for complementation of a Boolean function which is fundamental for logic synthesis and data mining. Successively, the existing problems and their proposed solutions are examined, including the analysis of current implementations of the algorithm. Then, methods to speed up the computation process and efficient parallel implementation of the algorithm are shown; they include optimization of data representation, recursive decomposition, merging, and removal of redundant data. Besides the discussion of computational complexity, the paper compares the processing times of the proposed solution with those for the well-known analysis and data mining systems. Although the presented idea is focused on searching for all possible solutions, it can be restricted to finding just those of the smallest size. Both approaches are of great application potential, including proving mathematical theorems, logic synthesis, especially index generation functions, or data processing and mining such as feature selection, data discretization, rule generation, etc. The problem considered is NP-hard, and it is easy to point to examples that are not solvable within the expected amount of time. However, the solution allows the barrier of computations to be moved one step further. For example, the unique algorithm can calculate, as the only one at the moment, all minimal sets of features for few standard benchmarks. Unlike many existing methods, the algorithm additionally works with undetermined values. The result of this research is an easily extendable experimental software that is the fastest among the tested solutions and the data mining systems.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2018, 28, 4; 803-815
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: An effective data reduction model for machine emergency state detection from big data tree topology structures
Autorzy:: Iaremko, Iaroslav
Senkerik, Roman
Jasek, Roman
Lukastik, Petr
Powiązania:: https://bibliotekanauki.pl/articles/2055178.pdf
Data publikacji:: 2021
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: OPC UA
OPC tree
principal component analysis
PCA
big data analysis
data reduction
machine tool
anomaly detection
emergency states
analiza głównych składowych
duży zbiór danych
redukcja danych
wykrywanie anomalii
stan nadzwyczajny
Opis:: This work presents an original model for detecting machine tool anomalies and emergency states through operation data processing. The paper is focused on an elastic hierarchical system for effective data reduction and classification, which encompasses several modules. Firstly, principal component analysis (PCA) is used to perform data reduction of many input signals from big data tree topology structures into two signals representing all of them. Then the technique for segmentation of operating machine data based on dynamic time distortion and hierarchical clustering is used to calculate signal accident characteristics using classifiers such as the maximum level change, a signal trend, the variance of residuals, and others. Data segmentation and analysis techniques enable effective and robust detection of operating machine tool anomalies and emergency states due to almost real-time data collection from strategically placed sensors and results collected from previous production cycles. The emergency state detection model described in this paper could be beneficial for improving the production process, increasing production efficiency by detecting and minimizing machine tool error conditions, as well as improving product quality and overall equipment productivity. The proposed model was tested on H-630 and H-50 machine tools in a real production environment of the Tajmac-ZPS company.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2021, 31, 4; 601--611
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Redukcja wymiarowości danych pomiarowych z wykorzystaniem liniowej i nieliniowej analizy składników głównych (PCA)
Dimensionality reduction of measurement data using linear and nonlinear PCA
Autorzy:: Rogala, T.
Brykalski, A.
Powiązania:: https://bibliotekanauki.pl/articles/156148.pdf
Data publikacji:: 2005
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: redukcja wymiarowości danych pomiarowych
analiza liniowa
analiza nieliniowa
linear and nonlinear data using PCA
dimensionality reduction of measurement
Opis:: Reprezentacja danych wielowymiarowych na płaszczyźnie lub w przestrzeni jest często spotykanym zagadnieniem w rozpoznawaniu obrazów. Jednak wykorzystane w tej dziedzinie metody mogą być stosowane zawsze, gdy zachodzi konieczność wizualizacji złożonych danych pomiarowych. Uznaną i często stosowaną do tego celu techniką jest tzw. Analiza Składników Głównych (ang. Principal Components Analysis). Ponieważ jest to przekształcenie liniowe, posiada ono liczne ograniczenia. Wersja nieliniowa tego przekształcenia, tzw. NLPCA, pozwala ominąć te niedogodności, za cenę pewnej niejednoznaczności wyniku. Praca opisuje wyżej wymienione przekształcenia, ich implementację (m.in. przy użyciu sieci neuronowych), oraz przykładowe zastosowanie w odniesieniu do danych "syntetycznych" i pochodzących z rzeczywistych pomiarów.
Representation of multidemensional data on 2D or 3D plane is a common task in pattern classification. However, the dimensionality reduction techniques can be applied whenever sophisticated measurement data have to be visualized. Principal Component Analysis (PCA) is well known and widely applied method. Since it is a linear transform it suffers from certain limitations. Nonlinear PCA (NLPCA) enables overcoming these difficulties, in exchange for an ambiguity of the results. This paper discusses mentioned transformations and their implementations, including neural network based approaches. Two datasets are analyzed. Comparison of the results is followed by detailed discussion.
Źródło:: Pomiary Automatyka Kontrola; 2005, R. 51, nr 2, 2; 41-44
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Pattern layer reduction for a generalized regression neural network by using a self-organizing map
Autorzy:: Kartal, S.
Oral, M.
Ozyildirim, B. M.
Powiązania:: https://bibliotekanauki.pl/articles/329728.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: generalized regression neural network
artificial neural network
self organizing map
nearest neighbour
reduced dataset
sztuczna sieć neuronowa
mapa samoorganizująca
metoda najbliższych sąsiadów
redukcja zbioru danych
Opis:: In a general regression neural network (GRNN), the number of neurons in the pattern layer is proportional to the number of training samples in the dataset. The use of a GRNN in applications that have relatively large datasets becomes troublesome due to the architecture and speed required. The great number of neurons in the pattern layer requires a substantial increase in memory usage and causes a substantial decrease in calculation speed. Therefore, there is a strong need for pattern layer size reduction. In this study, a self-organizing map (SOM) structure is introduced as a pre-processor for the GRNN. First, an SOM is generated for the training dataset. Second, each training record is labelled with the most similar map unit. Lastly, when a new test record is applied to the network, the most similar map units are detected, and the training data that have the same labels as the detected units are fed into the network instead of the entire training dataset. This scheme enables a considerable reduction in the pattern layer size. The proposed hybrid model was evaluated by using fifteen benchmark test functions and eight different UCI datasets. According to the simulation results, the proposed model significantly simplifies the GRNN’s structure without any performance loss.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2018, 28, 2; 411-424
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: On a book Algorithms for data science by Brian Steele, John Chandler and Swarn Reddy
Autorzy:: Szajowski, Krzysztof J.
Powiązania:: https://bibliotekanauki.pl/articles/747695.pdf
Data publikacji:: 2017
Wydawca:: Polskie Towarzystwo Matematyczne
Tematy:: histogram
algorytm centroidów
Algorithms
Associative Statistics
Computation
Computing Similarity
Cluster Analysis
Correlation
Data Reduction
Data Mapping
Data Dictionary
Data Visualization
Forecasting
Hadoop
Histogram
k-Means Algorithm
k-Nearest Neighbor Prediction
Algorytmy
miary zależności
obliczenia
analiza skupień
korelacja
redukcja danych
transformacja danych
wizualizacja danych
prognozowanie
algorytm k-średnich
algorytm k najbliższych sąsiadów
Opis:: Przedstawiona tutaj pozycja wydawnicza jest obszernym wprowadzeniem do najważniejszych podstawowych zasad, algorytmów i danych wraz zestrukturami, do których te zasady i algorytmy się odnoszą. Przedstawione zaganienia są wstępem do rozważań w dziedzinie informatyki. Jednakże, to algorytmy są podstawą analityki danych i punktem skupienia tego podręcznika. Pozyskiwanie wiedzy z danych wymaga wykorzystania metod i rezultatów z co najmniej trzech dziedzin: matematyki, statystyki i informatyki. Książka zawiera jasne i intuicyjne objaśnienia matematyczne i statystyczne poszczególnych zagadnień, przez co algorytmy są naturalne i przejrzyste. Praktyka analizy danych wymaga jednak więcej niż tylko dobrych podstaw naukowych, ścisłości matematycznej i spojrzenia od strony metodologii statystycznej. Zagadnienia generujące dane są ogromnie zmienne, a dopasowanie metod pozyskiwania wiedzy może być przeprowadzone tylko w najbardziej podstawowych algorytmach. Niezbędna jest płynność programowania i doświadczenie z rzeczywistymi problemami. Czytelnik jest prowadzony przez zagadnienia algorytmiczne z wykorzystaniem Pythona i R na bazie rzeczywistych problemów i analiz danych generowanych przez te zagadnienia. Znaczną część materiału zawartego w książce mogą przyswoić również osoby bez znajomości zaawansowanej metodologii. To powoduje, że książka może być przewodnikiem w jedno lub dwusemestralnym kursie analityki danych dla studentów wyższych lat studiów matematyki, statystyki i informatyki. Ponieważ wymagana wiedza wstępna nie jest zbyt obszerna, studenci po kursie z probabilistyki lub statystyki, ze znajomością podstaw algebry i analizy matematycznej oraz po kurs programowania nie będą mieć problemów, tekst doskonale nadaje się także do samodzielnego studiowania przez absolwentów kierunków ścisłych. Podstawowy materiał jest dobrze ilustrowany obszernymi zagadnieniami zaczerpniętymi z rzeczywistych problemów. Skojarzona z książką strona internetowa wspiera czytelnika danymi wykorzystanymi w książce, a także prezentacją wybranych fragmentów wykładu. Jestem przekonany, że tematem książki jest nowa dziedzina nauki.
The book under review gives a comprehensive presentation of data science algorithms, which means on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. The data science, as the authors claim, is the discipline since 2001. However, informally it worked before that date (cf. Cleveland(2001)). The crucial role had the graphic presentation of the data as the visualization of the knowledge hidden in the data. It is the discipline which covers the data mining as the tool or important topic. The escalating demand for insights into big data requires a fundamentally new approach to architecture, tools, and practices. It is why the term data science is useful. It underscores the centrality of data in the investigation because they store of potential value in the field of action. The label science invokes certain very real concepts within it, like the notion of public knowledge and peer review. This point of view makes that the data science is not a new idea. It is part of a continuum of serious thinking dates back hundreds of years. The good example of results of data science is the Benford law (see Arno Berger and Theodore P. Hill(2015, 2017). In an effort to identifying some of the best-known algorithms that have been widely used in the data mining community, the IEEE International Conference on Data Mining (ICDM) has identified the top 10 algorithms in data mining for presentation at ICDM '06 in Hong Kong. This panel will announce the top 10 algorithms and discuss the impact and further research of each of these 10 algorithms in 2006. In the present book, there are clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. Most of the algorithms announced by IEEE in 2006 are included. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data are indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analysis.
Źródło:: Mathematica Applicanda; 2017, 45, 2
1730-2668
2299-4009
Pojawia się w:: Mathematica Applicanda
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Porównanie algorytmów ekstrakcji punktów istotnych w upraszczaniu numerycznych modeli terenu o strukturze hybrydowej
The comparison of algorithms for key points extraction in simplification of hybrid digital terrain models
Autorzy:: Bakuła, K.
Powiązania:: https://bibliotekanauki.pl/articles/131096.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Geodetów Polskich
Tematy:: numeryczny model terenu
punkty istotne
redukcja ilościowa danych
generalizacja
indeks pozycji topograficznej
TPI
Very Important Points
VIP
Z-tolerance
digital terrain model (DTM)
critical points
data reduction
generalization
Topographic Position Index
Opis:: Przedstawione badania dotyczą opracowania algorytmu redukcji ilości danych wysokościowych w postaci numerycznego modelu terenu z lotniczego skanowania laserowego (ALS) dla potrzeb modelowania powodziowego. Redukcja jest procesem niezbędnym w przetwarzaniu ogromnych zbiorów danych z ALS, a jej przebieg nie może mieć charakteru regularnej filtracji danych, co często ma miejsce w praktyce. Działanie takie prowadzi do pominięcia szeregu istotnych form terenowych z punktu widzenia modelowania hydraulicznego. Jednym z proponowanych rozwiązań dla redukcji danych wysokościowych zawartych w numerycznych modelach terenu jest zmiana jego struktury z regularnej siatki na strukturę hybrydową z regularnie rozmieszczonymi punktami oraz nieregularnie rozlokowanymi punktami istotnymi. Celem niniejszego artykułu jest porównanie algorytmów ekstrakcji punktów istotnych z numerycznych modeli terenu, które po przetworzeniu ich z użyciem redukcji danych zachowają swoją dokładność przy jednoczesnym zmniejszeniu rozmiaru plików wynikowych. W doświadczeniach zastosowano algorytmy: indeksu pozycji topograficznej (TPI), Very Important Points (VIP) oraz Z-tolerance, które posłużyły do stworzenia numerycznych modeli terenu, podlegających następnie ocenie w porównaniu z danymi wejściowymi. Analiza taka pozwoliła na porównanie metod. Wyniki badań potwierdzają możliwości uzyskania wysokiego stopnia redukcji, która wykorzystuje jedynie kilka procent danych wejściowych, przy relatywnie niewielkim spadku dokładności pionowej modelu terenu sięgającego kilku centymetrów.
The presented research concerns methods related to reduction of elevation data contained in digital terrain model (DTM) from airborne laser scanning (ALS) in hydraulic modelling. The reduction is necessary in the preparation of large datasets of geospatial data describing terrain relief. Its course should not be associated with regular data filtering, which often occurs in practice. Such a method leads to a number of important forms important for hydraulic modeling being missed. One of the proposed solutions for the reduction of elevation data contained in DTM is to change the regular grid into the hybrid structure with regularly distributed points and irregularly located critical points. The purpose of this paper is to compare algorithms for extracting these key points from DTM. They are used in hybrid model generation as a part of elevation data reduction process that retains DTM accuracy and reduces the size of output files. In experiments, the following algorithms were tested: Topographic Position Index (TPI), Very Important Points (VIP) and Z-tolerance. Their effectiveness in reduction (maintaining the accuracy and reducing datasets) was evaluated in respect to input DTM from ALS. The best results were obtained for the Z-tolerance algorithm, but they do not diminish the capabilities of the other two algorithms: VIP and TPI which can generalize DTM quite well. The results confirm the possibility of obtaining a high degree of reduction reaching only a few percent of the input data with a relatively low decrease of vertical DTM accuracy to a few centimetres. The presented paper was financed by the Foundation for Polish Science - research grant no. VENTURES/2012-9/1 from Innovative Economy program of the European Structural Funds.
Źródło:: Archiwum Fotogrametrii, Kartografii i Teledetekcji; 2014, 26; 11-21
2083-2214
2391-9477
Pojawia się w:: Archiwum Fotogrametrii, Kartografii i Teledetekcji
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Efektywne wykorzystanie danych lidar w dwuwymiarowym modelowaniu hydraulicznym
Effective application of lidar data in two-dimensional hydraulic modelling
Autorzy:: Bakuła, K.
Powiązania:: https://bibliotekanauki.pl/articles/129777.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Geodetów Polskich
Tematy:: modelowanie hydrauliczne
lotnicze skanowanie laserowe
ALS
numeryczny model terenu
NMT
redukcja ilościowa danych
numeryczny model szorstkości podoża
hydraulic modeling
airborne laser scanning (ALS)
digital terrain model (DTM)
DTM
data reduction
digital roughness model
DRM
Opis:: W niniejszym artykule zaprezentowano aspekty wykorzystania danych z lotniczego skanowania laserowego w dwuwymiarowym modelowaniu hydraulicznym obejmujących tworzenie wysokiej dokładności numerycznych modeli terenu, ich efektywnego przetworzenia będącego kompromisem pomiędzy rozdzielczością danych a ich dokładnością, a także uzyskania informacji o szorstkości pokrycia terenu dającego informacje mogące konkurować z informacjami z baz topograficznych. Prezentowane badania dotyczyły przetwarzania danych NMT w celu ich efektywnego wykorzystania z udziałem metod redukcji ilościowej danych. W wyniku oddzielnego eksperymentu udowodniono, iż dane z lotniczego skanowania laserowego umożliwiają utworzenie w pełni poprawnego dwuwymiarowego modelu hydraulicznego przy założeniu posiadania odpowiednich danych hydraulicznych, a dane z ALS mogą mieć inne zastosowanie niż tylko geometryczna reprezentacja rzeźby terenu.
This paper presents aspects of ALS data usage in two-dimensional hydraulic modelling including generation of high-precision digital terrain models, their effective processing which is a compromise between the resolution and the accuracy of the processed data, as well as information about the roughness of the land cover providing information that could compete with information from topographic databases and orthophotomaps. Still evolving ALS technology makes it possible to collect the data with constantly increasing spatial resolution that guarantees correct representation of the terrain shape and height. It also provides a reliable description of the land cover. However, the size of generated files may cause problems in their effective usage in the 2D hydraulic modeling where Saint-Venant’s equations are implemented. High-resolution elevation models make it impossible or prolong the duration of the calculations for large areas in complex algorithms defining a model of the water movement, which is directly related to the cost of the hydraulic analysis. As far as an effective usage of voluminous datasets is concerned, the data reduction is recommended. Such a process should reduce the size of the data files, maintain their accuracy and keep the appropriate structure to allow their further application in the hydraulic modelling. An application of only a few percent of unprocessed datasets, selected with the use of specified filtering algorithms and spatial analysis tools, can give the same result of the hydraulic modeling obtained in a significantly shorter time than the result of the comparable operation on unprocessed datasets. Such an approach, however, is not commonly used, which means the most reliable hydraulic models are applied only in small areas in the largest cities. Another application of ALS data is its potential usage in digital roughness model creation for 2D hydraulic models. There are many possibilities of roughness coefficient estimation in hydraulic modelling which has an impact on the velocity of water flow. As a basic and reference source for such analysis topographic databases as well as orthophotomaps from aerial or satellite imagery can be considered. The presented paper proved that LIDAR data should be effectively applied in cooperation between surveyors and hydrologists. ALS data can be s used solely in the creation of a fully correct two-dimensional hydraulic model, assuming that appropriate hydraulic datasets are available. Additionally, application of ALS data should not be limited to geometric representation of the terrain and it can be used as information about roughness of terrain. The presented paper was financed by the Foundation for Polish Science - research grant no. VENTURES/2012-9/1 from Innovative Economy program of the European Structural Funds.
Źródło:: Archiwum Fotogrametrii, Kartografii i Teledetekcji; 2014, 26; 23-37
2083-2214
2391-9477
Pojawia się w:: Archiwum Fotogrametrii, Kartografii i Teledetekcji
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: An algorithm for reducing the dimension and size of a sample for data exploration procedures
Autorzy:: Kulczycki, P.
Łukasik, S.
Powiązania:: https://bibliotekanauki.pl/articles/330110.pdf
Data publikacji:: 2014
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: dimension reduction
sample size reduction
linear transformation
simulated annealing
data mining
redukcja wymiaru
transformacja liniowa
wyżarzanie symulowane
eksploracja danych
Opis:: The paper deals with the issue of reducing the dimension and size of a data set (random sample) for exploratory data analysis procedures. The concept of the algorithm investigated here is based on linear transformation to a space of a smaller dimension, while retaining as much as possible the same distances between particular elements. Elements of the transformation matrix are computed using the metaheuristics of parallel fast simulated annealing. Moreover, elimination of or a decrease in importance is performed on those data set elements which have undergone a significant change in location in relation to the others. The presented method can have universal application in a wide range of data exploration problems, offering flexible customization, possibility of use in a dynamic data environment, and comparable or better performance with regards to the principal component analysis. Its positive features were verified in detail for the domain’s fundamental tasks of clustering, classification and detection of atypical elements (outliers).
Źródło:: International Journal of Applied Mathematics and Computer Science; 2014, 24, 1; 133-149
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "redukcja danych" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język