Temat: GPU implementation - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: A new CUDA-based GPU implementation of the two-dimensional Athena code
Autorzy:: Wasilijew, A.
Murawski, K.
Powiązania:: https://bibliotekanauki.pl/articles/201940.pdf
Data publikacji:: 2013
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: CUDA-based GPU implementation
two-dimensional Athena code
magnetohydrodynamic equations
Opis:: We present a new version of the Athena code, which solves magnetohydrodynamic equations in two-dimensional space. This new implementation, which we have named Athena-GPU, uses CUDA architecture to allow the code execution on Graphical Processor Unit (GPU). The Athena-GPU code is an unofficial, modified version of the Athena code which was originally designed for Central Processor Unit (CPU) architecture. We perform numerical tests based on the original Athena-CPU code and its GPU counterpart to make a performance analysis, which includes execution time, precision differences and accuracy. We narrowed our tests and analysis only to double precision floating point operations and two-dimensional test cases. Our comparison shows that results are similar for both two versions of the code, which confirms correctness of our CUDA-based implementation. Our tests reveal that the Athena-GPU code can be 2 to 15-times faster than the Athena-CPU code, depending on test cases, the size of a problem and hardware configuration.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2013, 61, 1; 239-250
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: An optimized parallel implementation of non-iteratively trained recurrent neural networks
Autorzy:: El Zini, Julia
Rizk, Yara
Awad, Mariette
Powiązania:: https://bibliotekanauki.pl/articles/2031147.pdf
Data publikacji:: 2021
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: GPU implementation
parallelization
Recurrent Neural Network
RNN
Long-short Term Memory
LSTM
Gated Recurrent Unit
GRU
Extreme Learning Machines
ELM
non-iterative training
Opis:: Recurrent neural networks (RNN) have been successfully applied to various sequential decision-making tasks, natural language processing applications, and time-series predictions. Such networks are usually trained through back-propagation through time (BPTT) which is prohibitively expensive, especially when the length of the time dependencies and the number of hidden neurons increase. To reduce the training time, extreme learning machines (ELMs) have been recently applied to RNN training, reaching a 99% speedup on some applications. Due to its non-iterative nature, ELM training, when parallelized, has the potential to reach higher speedups than BPTT. In this work, we present Opt-PR-ELM, an optimized parallel RNN training algorithm based on ELM that takes advantage of the GPU shared memory and of parallel QR factorization algorithms to efficiently reach optimal solutions. The theoretical analysis of the proposed algorithm is presented on six RNN architectures, including LSTM and GRU, and its performance is empirically tested on ten time-series prediction applications. Opt- PR-ELM is shown to reach up to 461 times speedup over its sequential counterpart and to require up to 20x less time to train than parallel BPTT. Such high speedups over new generation CPUs are extremely crucial in real-time applications and IoT environments.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2021, 11, 1; 33-50
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "GPU implementation" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język