Autor: Bylina, J. - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Studying OpenMP thread mapping for parallel linear algebra kernels on multicore system
Autorzy:: Bylina, B.
Bylina, J.
Powiązania:: https://bibliotekanauki.pl/articles/200778.pdf
Data publikacji:: 2018
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: computation performance
OpenMP standard
nonnegative matrix factorization
thread mapping
energy consumption
wydajność obliczeniowa
Standard OpenMP
nieujemna faktoryzacja macierzy
mapowanie
zużycie energii
Opis:: Thread mapping is one of the techniques which allow for efficient exploiting of the potential of modern multicore architectures. The aim of this paper is to study the impact of thread mapping on the computing performance, the scalability, and the energy consumption for parallel dense linear algebra kernels on hierarchical shared memory multicore systems. We consider the basic application, namely a matrix-matrix product (GEMM), and two parallel matrix decompositions (LU and WZ). Both factorizations exploit parallel BLAS (basic linear algebra subprograms) operations, among others GEMM. We compare differences between various thread mapping strategies for these applications. Our results show that the choice of thread mapping has the measurable impact on the performance, the scalability, and energy consumption of the GEMM and two matrix factorizations.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2018, 66, 6; 981-990
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Influence of preconditioning and blocking on accuracy in solving Markovian models
Autorzy:: Bylina, B.
Bylina, J.
Powiązania:: https://bibliotekanauki.pl/articles/907654.pdf
Data publikacji:: 2009
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: kondycjonowanie
równanie liniowe
metoda blokowania
łańcuch Markowa
rozkład WZ
preconditioning
linear equations
blocking methods
Markov chains
WZ factorization
Opis:: The article considers the effectiveness of various methods used to solve systems of linear equations (which emerge while modeling computer networks and systems with Markov chains) and the practical influence of the methods applied on accuracy. The paper considers some hybrids of both direct and iterative methods. Two varieties of the Gauss elimination will be considered as an example of direct methods: the LU factorization method and the WZ factorization method. The Gauss-Seidel iterative method will be discussed. The paper also shows preconditioning (with the use of incomplete Gauss elimination) and dividing the matrix into blocks where blocks are solved applying direct methods. The motivation for such hybrids is a very high condition number (which is bad) for coefficient matrices occuring in Markov chains and, thus, slow convergence of traditional iterative methods. Also, the blocking, preconditioning and merging of both are analysed. The paper presents the impact of linked methods on both the time and accuracy of finding vector probability. The results of an experiment are given for two groups of matrices: those derived from some very abstract Markovian models, and those from a general 2D Markov chain.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2009, 19, 2; 207-217
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Computational aspects of GPU - accelerated sparse matrix - vector multiplication for solving Markov models
Obliczeniowe aspekty mnożenia macierzy rzadkiej przez wektor dla rozwiązywania modeli Markowa przyspieszanego przez karty GPU
Autorzy:: Bylina, B.
Bylina, J.
Karwacki, M.
Powiązania:: https://bibliotekanauki.pl/articles/375696.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: Markovian models
wireless network models
GPU
matrix-vector multiplication
sparse matrices
Opis:: In this article we investigate some computational aspects of GPU-accelerated matrix-vector multiplication where matrix is sparse. Particularly, we deal with sparse matrices appearing in modelling with Markovian queuing models. The model we use for research is a Markovian queuing model of a wireless device. This model describes the device’s behavior during possible channel occupation by other devices. We study the efficiency of multiplication of a sparse matrix by a dense vector with the use of an appropriate, ready-to-use GPU-accelerated mathematical library, namely CUSP. For the CUSP library we discuss data structures and their impact on the CUDA platform for the fine-grained parallel architecture of the GPU. Our aim is to find the best format for storing a sparse matrix for GPU-computation (especially one associated with the Markovian model of a wireless device). We compare the time, the performance and the speed-up for the card NVIDIA Tesla C2050 (with ECC ON). For unstructured matrices (as our Markovian matrices), we observe speed-ups (in respect to CPU-only computations) of over 8 times.
Łańcuchy Markowa są przydatnym narzędziem do modelowania systemów złożonych, takich jak systemy i sieci komputerowe. W ostatnich latach łańcuchy Markowa zostały z powodzeniem wykorzystane do oceny pracy sieci bezprzewodowych. Jednym z problemów jaki się pojawia przy wykorzystywaniu łańcuchów Markowa w modelowaniu sieci są problemy natury obliczeniowej. W artykule zajmiemy się badaniem mnożenia macierzy rzadkiej przez wektor, które jest jedną z głównych operacji podczas numerycznego rozwiązywania modeli Markowowskich. Aby, przyspieszyć czas obliczeń mnożenia macierz rzadkiej przez wektor wykorzystano funkcje z biblioteki CUSP. Biblioteka jest zbiorem funkcji wykonywanych na GPU (ang.Graphics Processing Unit) celem skrócenia czasu obliczeń. Do testowania operacji mnożenia macierzy rzadkiej przez wektor badano macierze z Markowowskiego modelu pracy sieci bezprzewodowej. Model ten opisuje zachowanie urządzenia, gdy kanał transmisyjnych może być zajęty przez inne urządzenia. Macierz przejść wspomnianego modelu jest macierzą rzadką i potrzeba specialnej struktury danych do jej przechowywania, dlatego w artykule dyskutowane są różne struktury danych dla macierzy rzadkich i ich przydatność do obliczen na kartach graficznych. W pracy porównano czas, wydajność i przyspieszenie jakie otrzymano podczas testowania biblioteki CUSP na karcie NVIDIA Tesla C2050 dla niestrukturalnych macierzy rzadkich opisujących model zajętości węzła w sieciach bezprzewodowych przy różnych formatach przechowywania macierzy rzadkich. Dla testowanych macierzy zauważono ośmiokrotne przyspieszenie obliczeń przy wykorzystaniu karty graficznej.
Źródło:: Theoretical and Applied Informatics; 2011, 23, 2; 127-145
1896-5334
Pojawia się w:: Theoretical and Applied Informatics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "Bylina, J." wg kryterium: Autor

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język