Temat: GPU computing - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads
Autorzy:: Frohmberg, W.
Kierzynka, M.
Blazewicz, J.
Gawron, P.
Wojciechowski, P.
Powiązania:: https://bibliotekanauki.pl/articles/200827.pdf
Data publikacji:: 2013
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: DNA assembly preprocessing
sequence alignment
GPU computing
Opis:: DNA/RNA sequencing has recently become a primary way researchers generate biological data for further analysis. Assembling algorithms are an integral part of this process. However, some of them require pairwise alignment to be applied to a great deal of reads. Although several efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units), none of them directly targets high-throughput sequencing data. As a result, a need arose to create software that could handle such data as effectively as possible. G-DNA (GPU-based DNA aligner) is the first highly parallel solution that has been optimized to process nucleotide reads (DNA/RNA) from modern sequencing machines. Results show that the software reaches up to 89 GCUPS (Giga Cell Updates Per Second) on a single GPU and as a result it is the fastest tool in its class. Moreover, it scales up well on multiple GPUs systems, including MPI-based computational clusters, where its performance is counted in TCUPS (Tera CUPS).
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2013, 61, 4; 989-992
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Heterogeneous GPU&CPU cluster for High Performance Computing in cryptography
Autorzy:: Marks, M.
Jantura, J.
Niewiadomska-Szynkiewicz, E.
Strzelczyk, P.
Góźdź, K.
Powiązania:: https://bibliotekanauki.pl/articles/305288.pdf
Data publikacji:: 2012
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: parallel computing
HPC
clusters
GPU computing
OpenCL
cryptography
cryptanalysis
Opis:: This paper addresses issues associated with distributed computing systems and the application of mixed GPU&CPU technology to data encryption and decryption algorithms. We describe a heterogenous cluster HGCC formed by two types of nodes: Intel processor with NVIDIA graphics processing unit and AMD processor with AMD graphics processing unit (formerly ATI), and a novel software framework that hides the heterogeneity of our cluster and provides tools for solving complex scientific and engineering problems. Finally, we present the results of numerical experiments. The considered case study is concerned with parallel implementations of selected cryptanalysis algorithms. The main goal of the paper is to show the wide applicability of the GPU&CPU technology to large scale computation and data processing.
Źródło:: Computer Science; 2012, 13 (2); 63-79
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: G-PAS 2.0 - an improved version of protein alignment tool with an efficient backtracking routine on multiple GPUs
Autorzy:: Frohmberg, W.
Kierzynka, M.
Blazewicz, J.
Wojciechowski, P.
Powiązania:: https://bibliotekanauki.pl/articles/201593.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: pairwise alignment
GPU computing
alignment with backtracking procedure
Opis:: Several highly efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units). G-PAS (GPU-based Pairwise Alignment Software) was one of them, however, with a couple of interesting features that made it unique. Nevertheless, in order to adapt it to a new computational architecture some changes had to be introduced. In this paper we present G-PAS 2.0 - a new version of the software for performing high-throughput alignment. Results show, that the new version is faster nearly by a fourth on the same hardware, reaching over 20 GCUPS (Giga Cell Updates Per Second).
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2012, 60, 3; 491-494
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: A Novel GPU-Enabled Simulator for Large Scale Spiking Neural Networks
Autorzy:: Szynkiewicz, P.
Powiązania:: https://bibliotekanauki.pl/articles/307680.pdf
Data publikacji:: 2016
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: GPU computing
OpenCL programming technology
parallel simulation
spiking neural networks
Opis:: The understanding of the structural and dynamic complexity of neural networks is greatly facilitated by computer simulations. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper a framework for modeling and parallel simulation of biological-inspired large scale spiking neural networks on high-performance graphics processors is described. This tool is implemented in the OpenCL programming technology. It enables simulation study with three models: Integrate-andfire, Hodgkin-Huxley and Izhikevich neuron model. The results of extensive simulations are provided to illustrate the operation and performance of the presented software framework. The particular attention is focused on the computational speed-up factor.
Źródło:: Journal of Telecommunications and Information Technology; 2016, 2; 34-42
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Application of the Lattice Boltzmann Method to the flow past a sphere
Autorzy:: Kajzer, A.
Pozorski, J.
Powiązania:: https://bibliotekanauki.pl/articles/281895.pdf
Data publikacji:: 2017
Wydawca:: Polskie Towarzystwo Mechaniki Teoretycznej i Stosowanej
Tematy:: bluff-body flow
Lattice Boltzmann Method
Large Eddy Simulation
GPU computing
Opis:: The results of fully resolved simulations and large eddy simulations of bluff-body flows obtained by means of the Lattice Boltzmann Method (LBM) are reported. A selection of Reynolds numbers has been investigated in unsteady laminar and transient flow regimes. Computed drag coefficients of a cube have been compared with the available data for validation purposes. Then, a more detailed analysis of the flow past a sphere is presented, including also the determination of vortex shedding frequency and the resulting Strouhal numbers. Advantages and drawbacks of the chosen geometry implementation technique, so called “staircase geometry”, are discussed. For the quest of maximum computational effi- ciency, all simulations have been carried out with the use of in-house code executed on GPU.
Źródło:: Journal of Theoretical and Applied Mechanics; 2017, 55, 3; 1091-1099
1429-2955
Pojawia się w:: Journal of Theoretical and Applied Mechanics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: A Hybrid CPU/GPU Cluster for Encryption and Decryption of Large Amounts of Data
Autorzy:: Niewiadomska-Szynkiewicz, E.
Marks, M.
Jantura, J.
Podbielski, M.
Powiązania:: https://bibliotekanauki.pl/articles/309363.pdf
Data publikacji:: 2012
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: AES
computer clusters
cryptography
DES
GPU computing
parallel calculation
software systems
Opis:: The main advantage of a distributed computing system over standalone computer is an ability to share the workload between cores, processors and computers. In our paper we present a hybrid cluster system - a novel computing architecture with multi-core CPUs working together with many-core GPUs. It integrates two types of CPU, i.e., Intel and AMD processor with advanced graphics processing units, adequately, Nvidia Tesla and AMD FirePro (formerly ATI). Our CPU/GPU cluster is dedicated to perform massive parallel computations which is a common approach in cryptanalysis and cryptography. The efficiency of parallel implementations of selected data encryption and decryption algorithms are presented to illustrate the performance of our system.
Źródło:: Journal of Telecommunications and Information Technology; 2012, 3; 32-39
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Exploiting multi-core and many-core parallelism for subspace clustering
Autorzy:: Datta, Amitava
Kaur, Amardeep
Lauer, Tobias
Chabbouh, Sami
Powiązania:: https://bibliotekanauki.pl/articles/331126.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: data mining
subspace clustering
multicore processor
many core processor
GPU computing
eksploracja danych
procesor wielordzeniowy
obliczenia GPU
Opis:: Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 1; 81-91
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Very Fast Non-Dominated Sorting
Autorzy:: Smutnicki, C.
Rudy, J.
Żelazny, D.
Powiązania:: https://bibliotekanauki.pl/articles/375948.pdf
Data publikacji:: 2014
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: parallel algorithms
Pareto sorting
computational complexity
GPU computing
multiple criteria decision analysis
NSGA-II
Opis:: A new and very efficient parallel algorithm for the Fast Non-dominated Sorting of Pareto fronts is proposed. By decreasing its computational complexity, the application of the proposed method allows us to increase the speedup of the best up to now Fast and Elitist Multi-Objective Genetic Algorithm (NSGA-II) more than two orders of magnitude. Formal proofs of time complexities of basic as well as improved versions of the procedure are presented. The provided experimental results fully confirm theoretical findings.
Źródło:: Decision Making in Manufacturing and Services; 2014, 8, 1-2; 13-23
1896-8325
2300-7087
Pojawia się w:: Decision Making in Manufacturing and Services
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: The Java profiler based on byte code analysis and instrumentation for many-core hardware accelerators
Autorzy:: Pietroń, M.
Karwatowski, M.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/114614.pdf
Data publikacji:: 2015
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: virtual machine
CUDA
GPU
profiling
parallel computing
Opis:: One of the most challenging issues in the case of many and multi-core architectures is how to exploit their potential computing power in legacy systems without a deep knowledge of their architecture. The analysis of static dependence and dynamic data dependences of a program run, can help to identify independent paths that could have been computed by individual parallel threads. The statistics of reusing the data and its size is also crucial in adapting the application in GPU many-core hardware architecture because of specific memory hierarchies. The proposed profiling system accomplishes static data analysis and computes dynamic dependencies for Java programs as well as recommends parts of source code with the highest potential for parallelization in GPU. Such an analysis can also provide starting point for automatic parallelization.
Źródło:: Measurement Automation Monitoring; 2015, 61, 7; 385-387
2450-2855
Pojawia się w:: Measurement Automation Monitoring
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Równoległa realizacja przykładowego algorytmu genetycznego z wykorzystaniem akceleratorów GPU
Autorzy:: Ratuszniak, P.
Stasiak, A.
Łańcucki, R.
Powiązania:: https://bibliotekanauki.pl/articles/118416.pdf
Data publikacji:: 2018
Wydawca:: Politechnika Koszalińska. Wydawnictwo Uczelniane
Tematy:: algorytm genetyczny
programowanie równoległe
akceleracja obliczeń
akceleratory GPU
CUDA
problem komiwojażera
genetic algorithm
parallel programming
computing acceleration
GPU
travelling salesman problem
Opis:: W artykule zaprezentowano praktyczną implementację aplikacji rozwiązującej przykładowy algorytm genetyczny z wykorzystaniem akceleratorów GPU. W tym przypadku zdecydowano się na rozwiązanie za pomocą algorytmu genetycznego typowego problemu optymalizacyjnego, jakim jest problem komiwojażera. Dodatkowo w celu wykorzystania mocy karty graficznej w tworzonej aplikacji wykorzystano technologię programowania na karcie graficznej – technologię Nvidia CUDA.
The paper presents a practical implementation of a local desktop application that solves exemplary genetic algorithm with the use of GPU accelerators. In this case decided with the use of genetic algorithm to solve typical optimization problem which is travelling salesman problem. Additionally used Nvidia CUDA programming technology in order to use power of GPU in created application.
Źródło:: Zeszyty Naukowe Wydziału Elektroniki i Informatyki Politechniki Koszalińskiej; 2018, 13; 63-78
1897-7421
Pojawia się w:: Zeszyty Naukowe Wydziału Elektroniki i Informatyki Politechniki Koszalińskiej
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Akceleracja obliczeń komputerowych za pomocą układów graficznych z wykorzystaniem technologii CUDA
Computing acceleration based on application of the CUDA technology
Autorzy:: Stefanowicz, Ł.
Wiśniewski, R.
Wiśniewska, M.
Powiązania:: https://bibliotekanauki.pl/articles/155246.pdf
Data publikacji:: 2011
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: procesor
obliczenia
równoległość
CPU
GPU
CUDA
multimedia
iteracja
wielowątkowość
processor
computing acceleration
parallelism
iteration
multithreading
Opis:: W artykule zaprezentowano możliwość zastosowania układów graficznych celem przyspieszenia obliczeń komputerowych. Przedstawiono technologię oraz architekturę CUDA firmy nVidia, a także podstawowe rozszerzenia względem standardów języka C. W referacie omówiono autorskie algorytmy testowe oraz metodykę badań, które przeprowadzono w celu określenia skuteczności akceleracji obliczeń komputerowych z wykorzystaniem procesorów graficznych GPU w porównaniu do rozwiązań tradycyjnych, opartych o CPU.
The paper deals with application of the graphic processor units (GPUs) to acceleration of computer operations and computations. The traditional computation methods are based on the Central Processor Unit (CPU), which ought to handle all computer operations and tasks. Such a solution is especially not effective in case of distributed systems where some sub-tasks can be performed in parallel. Many parallel threads can accelerate computing, which results in a shorter execution time. In the paper a new CUDA technology and architecture is shown. The presented idea of CUDA technology bases on application of the GPU processors to compu-tation to achieve better performance in comparison with the traditional methods, where CPUs are used. The GPU processors may perform multi-thread calculation. Therefore, especially in case of tasks where concurrency can be applied, CUDA may highly speed-up the computation process. The effectiveness of CUDA technology was verified experimentally. To perform investigations and experiments, the own test modules were used. The library of benchmarks consists of various algorithms, from simple iteration scripts to video processing methods. The results obtained from calculations performed via CPU and via GPU are compared and discussed.
Źródło:: Pomiary Automatyka Kontrola; 2011, R. 57, nr 8, 8; 954-956
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Performance enhancement of CUDA applications by overlapping data transfer and Kernel execution
Autorzy:: Raju, K.
Chiplunkar, Niranjan N
Powiązania:: https://bibliotekanauki.pl/articles/1956064.pdf
Data publikacji:: 2021
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: CPU-GPU
high-performance computing
kernel
data transfer
CUDA streams
obliczenia wysokiej wydajności
jądro
transfer danych
strumienie CUDA
Opis:: The CPU-GPU combination is a widely used heterogeneous computing system in which the CPU and GPU have different address spaces. Since the GPU cannot directly access the CPU memory, prior to invoking the GPU function the input data must be available on the GPU memory. On completion of GPU function, the results of computation are transferred to CPU memory. The CPU-GPU data transfer happens through PCIExpress bus. The PCI-E bandwidth is much lesser than that of GPU memory. The speed at which the data is transferred is limited by the PCI-E bandwidth. Hence, the PCI-E acts as a performance bottleneck. In this paper two approaches are discussed to minimize the overhead of data transfer, namely, performing the data transfer while the GPU function is being executed and reducing the amount of data to be transferred to GPU. The effectiveness of these approaches on the execution time of a set of CUDA applications is realized using CUDA streams. The results of our experiments show that the execution time of applications can be minimized with the proposed approaches.
Źródło:: Applied Computer Science; 2021, 17, 3; 5-18
1895-3735
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Adaptive partition-based logic simulation using GPGPU
Autorzy:: Zhang, M.
Zhang, Y
Yang, W.
Kai, Y.
Wei, T.
Fan, X.
Powiązania:: https://bibliotekanauki.pl/articles/398148.pdf
Data publikacji:: 2011
Wydawca:: Politechnika Łódzka. Wydział Mikroelektroniki i Informatyki
Tematy:: symulacja logiczna
obliczenia strumieniowe
GPGPU
obliczenia ogólnego przeznaczenia na układach GPU
CUDA
EDA
Automatyzacja procesu projektowania systemów elektronicznych
logic simulation
stream computing
Opis:: With the improvement of the gate complexity, the verification overhead becomes more decisive for VLSI design cost In order to reduce the simulation time, a adaptive partition based parallel method of VLSI logic simulation with GPGPU is addressed in this paper. The numerous arithmetic blocks of GPGPU is utilized simultaneously for disparate circuit macros. The partition strategy we proposed shows a sufficient flexibility to balance the different work load in parallel threads and fit the feature of GPU architecture. To explore the parallelism and locality of logic simulation further, the circuit macro is organized as stream data. The data dependency between the input and output nets in one individual logical path is handled with the shared memory of GPGPU. As for different logical paths, the dependency is processed by threads synchronization. To illustrate the performance, a serial experiments is implemented in Intel CoreDuo workstation with Nvidia GTX465 GPU board. Four typical digital circuits (LDPC, DES3, OpenRISC 1200 and OpenSPARCPARC T1) are considered as the benchmark. The result of experiments demonstrate a significant speed-up is achieved by using GPGPU parallel method, comparing with the CPU serial logic simulation. In maximal case (OpenS T1), the GPGPU parallel acceleration computes 21 times faster than serial program.
Źródło:: International Journal of Microelectronics and Computer Science; 2011, 2, 4; 121-128
2080-8755
2353-9607
Pojawia się w:: International Journal of Microelectronics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "GPU computing" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język