Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "danych" wg kryterium: Temat


Tytuł:
Efficient storage, retrieval and analysis of poker hands: An adaptive data framework
Autorzy:
Gorawski, M.
Lorek, M.
Powiązania:
https://bibliotekanauki.pl/articles/330018.pdf
Data publikacji:
2017
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
big data
storage model design
data architecture
data access
path optimization
zbiór danych
architektura danych
udostępnianie danych
optymalizacja obszaru
Opis:
In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a split level. Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover, we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query execution times and better cluster utilisation.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 713-726
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
On the predictive power of meta-features in OpenML
Autorzy:
Bilalli, B.
Abelló, A.
Aluja-Banet, T.
Powiązania:
https://bibliotekanauki.pl/articles/331086.pdf
Data publikacji:
2017
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
feature extraction
feature selection
meta learning
ekstrakcja danych
selekcja danych
uczenie maszynowe
Opis:
The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., nonexperienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 697-712
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Exploring complex and big data
Autorzy:
Stefanowski, J.
Krawiec, K.
Wrembel, R.
Powiązania:
https://bibliotekanauki.pl/articles/330152.pdf
Data publikacji:
2017
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
big data
complex data
data integration
data provenance
data streams
deep learning
dane złożone
integracja danych
pochodzenie danych
strumień danych
uczenie głębokie
Opis:
This paper shows how big data analysis opens a range of research and technological problems and calls for new approaches. We start with defining the essential properties of big data and discussing the main types of data involved. We then survey the dedicated solutions for storing and processing big data, including a data lake, virtual integration, and a polystore architecture. Difficulties in managing data quality and provenance are also highlighted. The characteristics of big data imply also specific requirements and challenges for data mining algorithms, which we address as well. The links with related areas, including data streams and deep learning, are discussed. The common theme that naturally emerges from this characterization is complexity. All in all, we consider it to be the truly defining feature of big data (posing particular research and technological challenges), which ultimately seems to be of greater importance than the sheer data volume.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2017, 27, 4; 669-679
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Linguistically defined clustering of data
Autorzy:
Leski, J. M.
Kotas, M. P.
Powiązania:
https://bibliotekanauki.pl/articles/329995.pdf
Data publikacji:
2018
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
data clustering
possibility theory
linguistic rules
data analysis
grupowanie danych
teoria możliwości
analiza danych
Opis:
This paper introduces a method of data clustering that is based on linguistically specified rules, similar to those applied by a human visually fulfilling a task. The method endeavors to follow these remarkable capabilities of intelligent beings. Even for most complicated data patterns a human is capable of accomplishing the clustering process using relatively simple rules. His/her way of clustering is a sequential search for new structures in the data and new prototypes with the use of the following linguistic rule: search for prototypes in regions of extremely high data densities and immensely far from the previously found ones. Then, after this search has been completed, the respective data have to be assigned to any of the clusters whose nuclei (prototypes) have been found. A human again uses a simple linguistic rule: data from regions with similar densities, which are located exceedingly close to each other, should belong to the same cluster. The goal of this work is to prove experimentally that such simple linguistic rules can result in a clustering method that is competitive with the most effective methods known from the literature on the subject. A linguistic formulation of a validity index for determination of the number of clusters is also presented. Finally, an extensive experimental analysis of benchmark datasets is performed to demonstrate the validity of the clustering approach introduced. Its competitiveness with the state-of-the-art solutions is also shown.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2018, 28, 3; 545-557
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A quaternion clustering framework
Autorzy:
Piórek, Michał
Jabłoński, Bartosz
Powiązania:
https://bibliotekanauki.pl/articles/330038.pdf
Data publikacji:
2020
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
data clustering
quaternions data processing
human gait
data processing
grupowanie danych
chód człowieka
przetwarzanie danych
Opis:
Data clustering is one of the most popular methods of data mining and cluster analysis. The goal of clustering algorithms is to partition a data set into a specific number of clusters for compressing or summarizing original values. There are a variety of clustering algorithms available in the related literature. However, the research on the clustering of data parametrized by unit quaternions, which are commonly used to represent 3D rotations, is limited. In this paper we present a quaternion clustering methodology including an algorithm proposal for quaternion based k-means along with quaternion clustering quality measures provided by an enhancement of known indices and an automated procedure of optimal cluster number selection. The validity of the proposed framework has been tested in experiments performed on generated and real data, including human gait sequences recorded using a motion capture technique.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2020, 30, 1; 133-147
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
IoT sensing networks for gait velocity measurement
Autorzy:
Chou, Jyun-Jhe
Shih, Chi-Sheng
Wang, Wei-Dean
Huang, Kuo-Chin
Powiązania:
https://bibliotekanauki.pl/articles/330707.pdf
Data publikacji:
2019
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
internet of things
IoT middleware
data fusion
data reduction
internet rzeczy
oprogramowanie pośredniczące
fuzja danych
redukcja danych
Opis:
Gait velocity has been considered the sixth vital sign. It can be used not only to estimate the survival rate of the elderly, but also to predict the tendency of falling. Unfortunately, gait velocity is usually measured on a specially designed walk path, which has to be done at clinics or health institutes. Wearable tracking services using an accelerometer or an inertial measurement unit can measure the velocity for a certain time interval, but not all the time, due to the lack of a sustainable energy source. To tackle the shortcomings of wearable sensors, this work develops a framework to measure gait velocity using distributed tracking services deployed indoors. Two major challenges are tackled in this paper. The first is to minimize the sensing errors caused by thermal noise and overlapping sensing regions. The second is to minimize the data volume to be stored or transmitted. Given numerous errors caused by remote sensing, the framework takes into account the temporal and spatial relationship among tracking services to calibrate the services systematically. Consequently, gait velocity can be measured without wearable sensors and with higher accuracy. The developed method is built on top of WuKong, which is an intelligent IoT middleware, to enable location and temporal-aware data collection. In this work, we present an iterative method to reduce the data volume collected by thermal sensors. The evaluation results show that the file size is up to 25% of that of the JPEG format when the RMSE is limited to 0.5º.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2019, 29, 2; 245-259
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
On the Equivalence of Problem-Oriented Databases
Autorzy:
Lebiediewa, S.
Powiązania:
https://bibliotekanauki.pl/articles/908275.pdf
Data publikacji:
1999
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
baza danych
język operowania danymi
rozproszona baza danych
problem-oriented database
data manipulations language
distributed databases
equivalence of databases
Opis:
A comparison is made between problem-oriented databases for multistage decision making and general-purpose databases. An exemplary problem-oriented database IDEN for supporting the process of multistage identification is discussed in detail. An equivalence condition for problem-oriented databases is formulated and then the equivalence with respect to data structures and operations is proved for hierarchical, network and relational models of the IDEN database.
Źródło:
International Journal of Applied Mathematics and Computer Science; 1999, 9, 4; 965-977
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A Fuzzy Logic Based Approach to Linguistic Summaries of Databases
Autorzy:
Kacprzyk, J.
Yager, R. R.
Zadrożny, S.
Powiązania:
https://bibliotekanauki.pl/articles/911154.pdf
Data publikacji:
2000
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
logika rozmyta
baza danych
podsumowanie lingwistyczne
zgłębianie danych
fuzzy logic
linguistic summary
computing with words
data mining
fuzzy querying
Opis:
In this paper, we present basic ideas and perspectives related to the use of fuzzy logic for the derivation of linguistic summaries of data (databases). We concentrate on the issue of how to measure the goodness of a linguistic summary, and on how to embed data summarization within the fuzzy querying environment, for an effective and efficient implementation. In particular, we propose how to efficiently implement Kacprzyk and Yager's (2000) new quality indicators of linguistic summaries to derive summaries via Kacprzyk and Zadrozny's (1994; 1995a; 1995b; 1996) fuzzy querying add-on. Finally, we present an implementation for deriving linguistic summaries of a sales database at a computer retailer, and show how the linguistic summaries obtained can be useful for supporting decisions of the business owner.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2000, 10, 4; 813-834
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Applications of rough sets in big data analysis: An overview
Autorzy:
Pięta, Piotr
Szmuc, Tomasz
Powiązania:
https://bibliotekanauki.pl/articles/2055175.pdf
Data publikacji:
2021
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
rough sets theory
big data analysis
deep learning
data mining
teoria zbiorów przybliżonych
duży zbiór danych
uczenie głębokie
eksploracja danych
Opis:
Big data, artificial intelligence and the Internet of things (IoT) are still very popular areas in current research and industrial applications. Processing massive amounts of data generated by the IoT and stored in distributed space is not a straightforward task and may cause many problems. During the last few decades, scientists have proposed many interesting approaches to extract information and discover knowledge from data collected in database systems or other sources. We observe a permanent development of machine learning algorithms that support each phase of the data mining process, ensuring achievement of better results than before. Rough set theory (RST) delivers a formal insight into information, knowledge, data reduction, uncertainty, and missing values. This formalism, formulated in the 1980s and developed by several researches, can serve as a theoretical basis and practical background for dealing with ambiguities, data reduction, building ontologies, etc. Moreover, as a mature theory, it has evolved into numerous extensions and has been transformed through various incarnations, which have enriched expressiveness and applicability of the related tools. The main aim of this article is to present an overview of selected applications of RST in big data analysis and processing. Thousands of publications on rough sets have been contributed; therefore, we focus on papers published in the last few years. The applications of RST are considered from two main perspectives: direct use of the RST concepts and tools, and jointly with other approaches, i.e., fuzzy sets, probabilistic concepts, and deep learning. The latter hybrid idea seems to be very promising for developing new methods and related tools as well as extensions of the application area.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2021, 31, 4; 659--683
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
An algorithm for arbitrary-order cumulant tensor calculation in a sliding window of data streams
Autorzy:
Domino, Krzysztof
Gawron, Piotr
Powiązania:
https://bibliotekanauki.pl/articles/330468.pdf
Data publikacji:
2019
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
high order cumulant
time series statistics
nonnormally distributed data
data streaming
kumulant wysokiego rzędu
szereg czasowy
baza danych rozproszona
strumień danych
Opis:
High-order cumulant tensors carry information about statistics of non-normally distributed multivariate data. In this work we present a new efficient algorithm for calculation of cumulants of arbitrary orders in a sliding window for data streams. We show that this algorithm offers substantial speedups of cumulant updates compared with the current solutions. The proposed algorithm can be used for processing on-line high-frequency multivariate data and can find applications, e.g., in on-line signal filtering and classification of data streams. To present an application of this algorithm, we propose an estimator of non-Gaussianity of a data stream based on the norms of high order cumulant tensors. We show how to detect the transition from Gaussian distributed data to non-Gaussian ones in a data stream. In order to achieve high implementation efficiency of operations on super-symmetric tensors, such as cumulant tensors, we employ a block structure to store and calculate only one hyper-pyramid part of such tensors.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2019, 29, 1; 195-206
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Rough Sets Methods in Feature Reduction and Classification
Autorzy:
Świniarski, R. W.
Powiązania:
https://bibliotekanauki.pl/articles/908366.pdf
Data publikacji:
2001
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
rozpoznawanie obrazów
redukcja danych
rough sets
feature selection
classification
Opis:
The paper presents an application of rough sets and statistical methods to feature reduction and pattern recognition. The presented description of rough sets theory emphasizes the role of rough sets reducts in feature selection and data reduction in pattern recognition. The overview of methods of feature selection emphasizes feature selection criteria, including rough set-based methods. The paper also contains a description of the algorithm for feature selection and reduction based on the rough sets method proposed jointly with Principal Component Analysis. Finally, the paper presents numerical results of face recognition experiments using the learning vector quantization neural network, with feature selection based on the proposed principal components analysis and rough sets methods.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2001, 11, 3; 565-582
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Efficient astronomical data condensation using approximate nearest neighbors
Autorzy:
Łukasik, Szymon
Lalik, Konrad
Sarna, Piotr
Kowalski, Piotr A.
Charytanowicz, Małgorzata
Kulczycki, Piotr
Powiązania:
https://bibliotekanauki.pl/articles/907932.pdf
Data publikacji:
2019
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
big data
astronomical observation
data reduction
nearest neighbor search
kd-trees
duży zbiór danych
obserwacja astronomiczna
redukcja danych
wyszukiwanie najbliższego sąsiada
drzewo kd
Opis:
Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2019, 29, 3; 467-476
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Fusion of clinical data: A case study to predict the type of treatment of bone fractures
Autorzy:
Haq, Anam
Wilk, Szymon
Abelló, Alberto
Powiązania:
https://bibliotekanauki.pl/articles/330674.pdf
Data publikacji:
2019
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
clinical data
data fusion
combination of data
combination of interpretation
prediction model
decision support
dane kliniczne
fuzja danych
łączenie danych
model predykcyjny
wspomaganie decyzji
Opis:
A prominent characteristic of clinical data is their heterogeneity—such data include structured examination records and laboratory results, unstructured clinical notes, raw and tagged images, and genomic data. This heterogeneity poses a formidable challenge while constructing diagnostic and therapeutic decision models that are currently based on single modalities and are not able to use data in different formats and structures. This limitation may be addressed using data fusion methods. In this paper, we describe a case study where we aimed at developing data fusion models that resulted in various therapeutic decision models for predicting the type of treatment (surgical vs. non-surgical) for patients with bone fractures. We considered six different approaches to integrate clinical data: one fusion model based on combination of data (COD) and five models based on combination of interpretation (COI). Experimental results showed that the decision model constructed following COI fusion models is more accurate than decision models employing COD. Moreover, statistical analysis using the one-way ANOVA test revealed that there were two groups of constructed decision models, each containing the set of three different models. The results highlighted that the behavior of models within a group can be similar, although it may vary between different groups.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2019, 29, 1; 51-67
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A complete gradient clustering algorithm formed with kernel estimators
Autorzy:
Kulczycki, P.
Charytanowicz, M.
Powiązania:
https://bibliotekanauki.pl/articles/907781.pdf
Data publikacji:
2010
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
analiza danych
eksploracja danych
grupowanie
metoda statystyczna
estymacja jądrowa
obliczenia numeryczne
data analysis
data mining
clustering
gradient procedures
nonparametric statistical methods
kernel estimators
numerical calculations
Opis:
The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning of particular parameters is shown, followed by the effects resulting from possible modifications with respect to their primarily assigned optimal values. The proposed algorithm does not demand strict assumptions regarding the desired number of clusters, which allows the obtained number to be better suited to a real data structure. Moreover, a feature specific to it is the possibility to influence the proportion between the number of clusters in areas where data elements are dense as opposed to their sparse regions. Finally, the algorithm-by the detection of one-element clusters-allows identifying atypical elements, which enables their elimination or possible designation to bigger clusters, thus increasing the homogeneity of the data set.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2010, 20, 1; 123-134
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Modeling and querying facts with period timestamps in data warehouses
Autorzy:
Mahlknecht, Giovanni
Dignös, Anton
Kozmina, Natalija
Powiązania:
https://bibliotekanauki.pl/articles/331124.pdf
Data publikacji:
2019
Wydawca:
Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:
data warehouse
time period
logical model
hurtownia danych
odcinek czasu
model logiczny
Opis:
In this paper, we study various ways of representing and querying fact data that are time-stamped with a time period in a data warehouse. The main focus is on how to represent the time periods that are associated with the facts in order to support convenient and efficient aggregations over time. We propose three distinct logical models that represent time periods as sets of all time points in a period (instant model), as pairs of start and end time points of a period (period model), and as atomic units that are explicitly stored in a new period dimension (period model). The period dimension is enriched with information about the days of each period, thereby combining the former two models. We use four different classes of aggregation queries to analyze query formulation, query execution, and query performance over the three models. An extensive empirical evaluation on synthetic and real-world datasets and the analysis of the query execution plans reveal that the period model is the best choice in terms of runtime and space for all four query classes.
Źródło:
International Journal of Applied Mathematics and Computer Science; 2019, 29, 1; 31-49
1641-876X
2083-8492
Pojawia się w:
International Journal of Applied Mathematics and Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies