Autor: Pietroń, M. - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Elementary functions in HLL on example of CORDIC algorithm implemented in Mitrion-C language
Implementacja funkcji elementarnych w FPGA na przykładzie algorytmu CORDIC w języku wysokiego poziomu Mitrion-C
Autorzy:: Pietroń, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/156559.pdf
Data publikacji:: 2012
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: HLL
FPGA
funkcje elementarne
HPRC
elementary functions
Opis:: The elementary functions are very often used in scientific computations. The quantum chemistry, physics, financial computing are only examples were elementary functions like exponent, logarithm are intensively computed. This paper presents implementation of an exp(x) core in a CORDIC-algorithm written in Mitrion-C lanuage. The Mitrion-C language is a new high level language. It enables implementing pipelined and wide paralleled algorithms on FPGA platforms. It makes process of algorithms implementation on FPGA faster. From gravitational forces to quantum chemistry or financial mathematics, computational scientists very often use exp(x) in computer simulations. The implemented core generates IEEE 754 standard single precision exponential values. The CORDIC algorithm can be used to compute wide spectrum of different elementary functions like sine, cosine, tangent. In our solution values of the exponent for integer part of the input argument are stored in a table. The table is allocated in an internal memory. The fractional part is computed by the CORDIC algorithm. The final result is achieved by multiplying the values of the fractional and integer part. Our implementation is made on SGI Altix 4700 hardware platform. It is SGI multiprocessor distributed shared memory computer system with Virtex-4 LX 200 FPGAs.
Funkcje elementarne są bardzo często wykorzystywane w obliczeniach naukowych. Chemia kwantowa, matematyka finansowa, fizyka jedne z wielu dziedzin gdzie funkcje takie jak eksponenta, logarytm są intensywnie wykonywane. Praca ta przedstawia implementację funkcji eksponenty za pomocą algorytmu CORDIC w języku Mitrion-C. Mitrion-C jest nowym językiem wysokiego poziomu programowania układów FPGA. Język ten posiada odpowiednie instrukcje oraz wbudowane typy danych, które pozwalają na programowanie algorytmów potokowo jak i całkowicie równolegle. W naszym rozwiązaniu argument wejściowy jest rozdzielony na część całkowitą i część ułamkową. Wartości eksponenty dla części całkowitej przechowywane są w tablicy w pamięci wewnętrznej natomiast część wartość dla części ułamkowej obliczana jest algorytmem CORDIC. Wynik końcowy obliczany jest za pomocą mnożenia części ułamkowej i całkowitej. Implementacja wykonana jest na platformie sprzętowej SGI ALTIX 4700. Jest to platforma wieloprocesorowa ze współdzieloną pamięcią oraz układami FPGA typu Virtex-4 LX 200.
Źródło:: Pomiary Automatyka Kontrola; 2012, R. 58, nr 7, 7; 671-673
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Loop profiling tool for HPC code inspection as an efficient method of FPGA based acceleration
Autorzy:: Pietroń, M.
Russek, P.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/929582.pdf
Data publikacji:: 2010
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: HPC
HPRC
obliczanie wysokowartościowe
obliczanie rekonfigurowalne
przetwarzanie danych
loop profiling
Mitrion-C
DFG (data flow graph)
Opis:: This paper presents research on FPGA based acceleration of HPC applications. The most important goal is to extract a code that can be sped up. A major drawback is the lack of a tool which could do it. HPC applications usually consist of a huge amount of a complex source code. This is one of the reasons why the process of acceleration should be as automated as possible. Another reason is to make use of HLLs (High Level Languages) such as Mitrion-C (Mohl, 2006). HLLs were invented to make the development of HPRC applications faster. Loop profiling is one of the steps to check if the insertion of an HLL to an existing HPC source code is possible to gain acceleration of these applications. Hence the most important step to achieve acceleration is to extract the most time consuming code and data dependency, which makes the code easier to be pipelined and parallelized. Data dependency also gives information on how to implement algorithms in an FPGA circuit with minimal initialization of it during the execution of algorithms.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2010, 20, 3; 581-589
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: The Java profiler based on byte code analysis and instrumentation for many-core hardware accelerators
Autorzy:: Pietroń, M.
Karwatowski, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/114614.pdf
Data publikacji:: 2015
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: virtual machine
CUDA
GPU
profiling
parallel computing
Opis:: One of the most challenging issues in the case of many and multi-core architectures is how to exploit their potential computing power in legacy systems without a deep knowledge of their architecture. The analysis of static dependence and dynamic data dependences of a program run, can help to identify independent paths that could have been computed by individual parallel threads. The statistics of reusing the data and its size is also crucial in adapting the application in GPU many-core hardware architecture because of specific memory hierarchies. The proposed profiling system accomplishes static data analysis and computes dynamic dependencies for Java programs as well as recommends parts of source code with the highest potential for parallelization in GPU. Such an analysis can also provide starting point for automatic parallelization.
Źródło:: Measurement Automation Monitoring; 2015, 61, 7; 385-387
2450-2855
Pojawia się w:: Measurement Automation Monitoring
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Accelerating SELECT WHERE and SELECT JOIN queries on a GPU
Autorzy:: Pietroń, M.
Russek, P.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/305797.pdf
Data publikacji:: 2013
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: SQL
CUDA
relational databases
GPU
Opis:: This paper presents implementations of a few selected SQL operations using the CUDA programming framework on the GPU platform. Nowadays, the GPU’s parallel architectures give a high speed-up on certain problems. Therefore, the number of non-graphical problems that can be run and sped-up on the GPU still increases. Especially, there has been a lot of research in data mining on GPUs. In many cases it proves the advantage of offloading processing from the CPU to the GPU. At the beginning of our project we chose the set of SELECT WHERE and SELECT JOIN instructions as the most common operations used in databases. We parallelized these SQL operations using three main mechanisms in CUDA: thread group hierarchy, shared memories, and barrier synchronization. Our results show that the implemented highly parallel SELECT WHERE and SELECT JOIN operations on the GPU platform can be significantly faster than the sequential one in a database system run on the CPU.
Źródło:: Computer Science; 2013, 14 (2); 243-252
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: A study of parallel techniques for dimensionality reduction and its impact on the quality of text processing algorithms
Autorzy:: Pietroń, M.
Wielgosz, M.
Karwatowski, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/114190.pdf
Data publikacji:: 2015
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: singular value decomposition
vector space model
TFIDF
Opis:: The presented algorithms employ the Vector Space Model (VSM) and its enhancements such as TFIDF (Term Frequency Inverse Document Frequency) with Singular Value Decomposition (SVD). TFIDF were applied to emphasize the important features of documents and SVD was used to reduce the analysis space. Consequently, a series of experiments were conducted. They revealed important properties of the algorithms and their accuracy. The accuracy of the algorithms was estimated in terms of their ability to match the human classification of the subject. For unsupervised algorithms the entropy was used as a quality evaluation measure. The combination of VSM, TFIDF, and SVD came out to be the best performing unsupervised algorithm with entropy of 0.16.
Źródło:: Measurement Automation Monitoring; 2015, 61, 7; 352-353
2450-2855
Pojawia się w:: Measurement Automation Monitoring
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: FPGA implementation of procedures for video quality assessment
Autorzy:: Wielgosz, M.
Karwatowski, M.
Pietron, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/305403.pdf
Data publikacji:: 2018
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: video quality
video metrics
image processing
FPGA
Impulse C
Opis:: The video resolutions used in a variety of media are constantly rising. While manufacturers struggle to perfect their screens, it is also important to ensure the high quality of the displayed image. Overall quality can be measured using a Mean Opinion Score (MOS). Video quality can be affected by miscellaneous artifacts appearing at every stage of video creation and transmission. In this paper, we present a solution to calculate four distinct video quality metrics that can be applied to a real-time video quality assessment system. Our assessment module is capable of processing 8K resolution in real time set at a level of 30 frames per second. The throughput of 2.19 GB/s surpasses the performance of pure software solutions. The module was created using a high-level language to concentrate on architectural optimization.
Źródło:: Computer Science; 2018, 19 (3); 279-305
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Real time 8K video quality assessment using FPGA
Autorzy:: Wielgosz, M.
Pietroń, M.
Karwatowski, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/114403.pdf
Data publikacji:: 2016
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: video quality
video metrics
image processing
FPGA
Opis:: This paper presents a hardware architecture of the video quality assessment module. Two different metrics were implemented on FPGA using modern High Level Language for digital system design – Impulse C. FPGA resources consumption of the presented module is low, which enables module-level parallelization. Tests conducted for four modules working concurrently show that 1.96 GB/s throughput can be achieved. The module is capable of processing 8K video stream in a real-time manner i.e. 30 frames/second. Such high performance of the presented solution was achieved due to the series of architectural optimization introduced to the module, such as reduction of data precision and reuse of various module components.
Źródło:: Measurement Automation Monitoring; 2016, 62, 6; 187-189
2450-2855
Pojawia się w:: Measurement Automation Monitoring
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: The comparison of parallel sorting algorithms implemented on different hardware platforms
Autorzy:: Żurek, D.
Pietroń, M.
Wielgosz, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/305317.pdf
Data publikacji:: 2013
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: parallel algorithms
GPU
OpenMP
CUDA
sorting networks
merge-sort
Opis:: Sorting is a common problem in computer science. There are a lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, many-core and multi-core platforms have enabled the creation of wide parallel algorithms. We have standard processors that consist of multiple cores and hardware accelerators, like the GPU. Graphic cards, with their parallel architecture, provide new opportunities to speed up many algorithms. In this paper, we describe the results from the implementation of a few different parallel sorting algorithms on GPU cards and multi-core processors. Then, a hybrid algorithm will be presented, consisting of parts executed on both platforms (a standard CPU and GPU). In recent literature about the implementation of sorting algorithms in the GPU, a fair comparison between many core and multi-core platforms is lacking. In most cases, these describe the resulting time of sorting algorithm executions on the GPU platform and a single CPU core.
Źródło:: Computer Science; 2013, 14 (4); 679-691
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Assessment of various GPU acceleration strategies in text categorization processing flow
Autorzy:: Korduła, Ł.
Wielgosz, M.
Karwatowski, M.
Pietroń, M.
Żurek, D.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/114132.pdf
Data publikacji:: 2017
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: GPU
NLP
text categorization
OpenCL
Opis:: Automatic text categorization presents many difficulties. Modern algorithms are getting better in extracting meaningful information from human language. However, they often significantly increase complexity of computations. This increased demand for computational capabilities can be facilitated by the usage of hardware accelerators like general purpose graphic cards. In this paper we present a full processing flow for document categorization system. Gram-Schmidt process signatures calculation up to 12 fold decrease in computing time of system components.
Źródło:: Measurement Automation Monitoring; 2017, 63, 6; 203-205
2450-2855
Pojawia się w:: Measurement Automation Monitoring
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "Pietroń, M." wg kryterium: Autor

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język