Temat: High Performance Computing - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Robust Multiscale Modelling of Two-Phase Steels on Heterogeneous Hardware Infrastructures by Using Statistically Similar Representative Volume Element
Efektywne modelowanie wieloskalowe stali dwufazowych na heterogenicznych architekturach sprzętowych z wykorzystaniem statystycznie podobnych reprezentatywnych elementów objętościowych
Autorzy:: Rauch, Ł.
Bzowski, K.
Bachniak, D.
Pietrzyk, M.
Powiązania:: https://bibliotekanauki.pl/articles/958192.pdf
Data publikacji:: 2015
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: multiscale modelling
high performance computing
AHSS
modelowanie wieloskalowe
High Performance Computing
SSRVE
Opis:: The coupled finite element multiscale simulations (FE2) require costly numerical procedures in both macro and micro scales. Attempts to improve numerical efficiency are focused mainly on two areas of development, i.e. parallelization/distribution of numerical procedures and simplification of virtual material representation. One of the representatives of both mentioned areas is the idea of Statistically Similar Representative Volume Element (SSRVE). It aims at the reduction of the number of finite elements in micro scale as well as at parallelization of the calculations in micro scale which can be performed without barriers. The simplification of computational domain is realized by transformation of sophisticated images of material microstructure into artificially created simple objects being characterized by similar features as their original equivalents. In existing solutions for two-phase steels SSRVE is created on the basis of the analysis of shape coefficients of hard phase in real microstructure and searching for a representative simple structure with similar shape coefficients. Optimization techniques were used to solve this task. In the present paper local strains and stresses are added to the cost function in optimization. Various forms of the objective function composed of different elements were investigated and used in the optimization procedure for the creation of the final SSRVE. The results are compared as far as the efficiency of the procedure and uniqueness of the solution are considered. The best objective function composed of shape coefficients, as well as of strains and stresses, was proposed. Examples of SSRVEs determined for the investigated two-phase steel using that objective function are demonstrated in the paper. Each step of SSRVE creation is investigated from computational efficiency point of view. The proposition of implementation of the whole computational procedure on modern High Performance Computing (HPC) infrastructures is described. It includes software architecture of the solution as well as presentation of the middleware applied for data farming purposes.
Symulacje wieloskalowe z wykorzystaniem sprzężonej metody elementów skończonych wymagają kosztownych numerycznie procedur zarówno w skali makro jak i mikro. Próby poprawy efektywności numerycznej skupione są przede wszystkim na dwóch obszarach rozwoju tj. zrównoleglenie/rozproszenie procedur numerycznych oraz uproszczenie wirtualnej reprezentacji materiału. Jedną z metod reprezentującą obydwa obszary jest podejście Statystycznie Podobnego Reprezentatywnego Elementu Objętościowego. Głównym celem tej metody jest redukcja ilości elementów dyskretyzujących przestrzeń obliczeniową, ale również możliwość zrównoleglenia obliczeń w skali mikro, które mogą być realizowane niezależnie od siebie. Uproszczenie domeny obliczeniowej poprzez tworzenie elementu SSRVE realizowane jest za pomocą metod optymalizacji umożliwiających tworzenie elementu najbardziej podobnego do rzeczywistego materiału na podstawie wybranych cech charakterystycznych. W rozwiązaniu dla stali dwufazowych cechy opisujące podobieństwo są tworzone na podstawie analizy współczynników kształtu ziaren martenzytu na zdjęciu rzeczywistej mikrostruktury. Natomiast podejście przedstawione w niniejszym artykule zostało rozbudowane dodatkowo o lokalne wartości naprężeń i odkształceń tak, aby w pełni odzwierciedlić podobieństwo zarówno wizualne jak i behawioralne. Różne formy funkcji celu zostały poddane analizie w procesie optymalizacji, a uzyskane wyniki zostały porównane pod względem jakości, a także efektywności i unikalności rozwiązania. Ostatecznie zaproponowana została najlepsza funkcja celu obejmująca współczynniki kształtu oraz wartości naprężeń i odkształceń. Przykłady SSRVE wyznaczone dla analizowanych stali dwufazowych zostały przedstawione w artykule. Natomiast każdy krok procedury tworzenia elementu SSRVE został poddany analizie wydajności obliczeniowe, na podstawie której zaproponowane zostało podejście wykorzystujące nowoczesne architektury sprzętowe wysokiej wydajności. Opis podejścia zawiera zarówno architekturę rozwiązania jak i prezentację oprogramowania warstwy pośredniczącej.
Źródło:: Archives of Metallurgy and Materials; 2015, 60, 3A; 1973-1979
1733-3490
Pojawia się w:: Archives of Metallurgy and Materials
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Design and performance evaluation of a Linux HPC cluster
Autorzy:: Pera, Donato
Powiązania:: https://bibliotekanauki.pl/articles/1955283.pdf
Data publikacji:: 2018
Wydawca:: Politechnika Gdańska
Tematy:: high performance computing
parallel computing
cluster design
HPL
Opis:: In this paper after a short theoretical introduction about modern techniques used inparallel computing, we report a case study related to the design and development of the Caliban Linux High Performance Computing cluster, carried out by the author in the High Performance Computing Laboratory of the University of L’Aquila. Finally we report some performance evaluation tests related to the Caliban cluster performed using HPL (High-Performance Linpack) benchmarks.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2018, 22, 2; 113-123
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Using Redis supported by NVRAM in HPC applications
Autorzy:: Malinowski, A.
Powiązania:: https://bibliotekanauki.pl/articles/305650.pdf
Data publikacji:: 2017
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: high performance computing
storage systems
NoSQL
NVRAM
Opis:: Nowadays, the efficiency of a storage systems is a bottleneck in many moern HPC clusters. High performance in traditional approach – processing using files – is often difficult to obtain because of model complexity and its read/write patterns. Alternative approach is applying a key-value database, which usually has low latency and scales well. On the other hand, many key-value stores suffer from limitation of memory capacity and vulnerability to serious faiures, which is caused by processing in RAM. Moreover, some research suggests, that scientific data models are not applicable to storage structures of key-value databases. In this paper, the author proposes resolving mentioned issues by replacing RAM with NVRAM. Practical example is based on Redis NoSQL. The article contains also a three domain specific APIs, that show the idea bhind transformation from HPC data model to Redis structures, as well as two micro-benchmarks results.
Źródło:: Computer Science; 2017, 18 (3); 287-300
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: BeesyCluster as front-end for high performance computing services
Autorzy:: Czarnul, P.
Powiązania:: https://bibliotekanauki.pl/articles/1941698.pdf
Data publikacji:: 2015
Wydawca:: Politechnika Gdańska
Tematy:: BeesyCluster
high performance computing
services
Web Service interface
Opis:: The paper presents the BeesyCluster system as a middleware allowing invocation of services on high performance computing resources within the NIWA Centre of Competence project. Access is possible through both WWW and SOAP Web Service interfaces. The former allows non-experienced users to invoke both simple and complex services exposed through easyto-use servlets. The latter is meant for integration of external applications with services made available from clusters or servers. Details of services such as APIs used for development as MPI, OpenMP, OpenCL as well as queuing systems are hidden from the user. The paper describes both the WWW and Web Service interfaces extended for use with files of large sizes. Mechanisms for selection of devices for execution of services are described along with experiments including remote invocations.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2015, 19, 4; 387-396
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: The scheduling of task graphs in high performance computing as service clouds
Autorzy:: Deniziak, S.
Paduch, P.
Powiązania:: https://bibliotekanauki.pl/articles/114424.pdf
Data publikacji:: 2016
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: high performance computing
cloud computing
scheduling
distributed processing
internet services
Opis:: In this paper we propose a new method of scheduling the distributed applications in cloud environment according to the High Performance Computing as a Service concept. We assume that applications, that are submitted for execution, are specified as task graphs. Our method dynamically schedules all the tasks using resource sharing by the applications. The goal of scheduling is to minimize the cost of resource hiring and the execution time of all incoming applications. Experimental results showed that our method gives significantly better utilization of computational resources than existing management methods for clouds.
Źródło:: Measurement Automation Monitoring; 2016, 62, 6; 193-195
2450-2855
Pojawia się w:: Measurement Automation Monitoring
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: The Analysis of OpenStack Cloud Computing Platform: Features and Performance
Autorzy:: Grzonka, D.
Powiązania:: https://bibliotekanauki.pl/articles/307878.pdf
Data publikacji:: 2015
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: cloud computing
high performance computing
OpenStack
parallel environments
resource utilization analysis
virtualization
Opis:: Over the decades the rapid development of broadly defined computer technologies, both software and hardware is observed. Unfortunately, software solutions are regularly behind in comparison to the hardware. On the other hand, the modern systems are characterized by a high demand for computing resources and the need for customization for the end users. As a result, the traditional way of system construction is too expensive, inflexible and it doesn’t have high resources utilization. Present article focuses on the problem of effective use of available physical and virtual resources based on the OpenStack cloud computing platform. A number of conducted experiments allowed to evaluate computing resources utility and to analyze performance depending on the allocated resources. Additionally, the paper includes structural and functional analysis of the OpenStack cloud platform.
Źródło:: Journal of Telecommunications and Information Technology; 2015, 3; 52-57
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: A parallel genetic algorithm for creating virtual portraits of historical figures
Autorzy:: Krawczyk, H.
Proficz, J.
Ziółkowski, T.
Powiązania:: https://bibliotekanauki.pl/articles/1933983.pdf
Data publikacji:: 2012
Wydawca:: Politechnika Gdańska
Tematy:: genetic algorithms
fitness function
KASKADA platform
parallel processing
high performance computing
Opis:: In this paper we present a genetic algorithm (GA) for creating hypothetical virtual portraits of historical figures and other individuals whose facial appearance is unknown. Our algorithm uses existing portraits of random people from a specific historical period and social background to evolve a set of face images potentially resembling the person whose image is to be found. We then use portraits of the person’s relatives to judge which of the evolved images are most likely to resemble his/her actual appearance. Unlike typical GAs, our algorithm uses a new supervised form of fitness function which itself is affected by the evolution process. Additional description of requested facial features can be provided to further influence the final solution (i.e. the virtual portrait). We present an example of a virtual portrait created by our algorithm. Finally, the performance of a parallel implementation developed for the KASKADA platform is presented and evaluated.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2012, 16, 1-2; 145-162
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Application of Virtual Reality and High Performance Computing in Designing Rotary Forming Processes
Autorzy:: Hojny, Marcin
Marynowski, Przemysław
Lipski, Grzegorz
Gądek, Tomasz
Nowacki, Łukasz
Powiązania:: https://bibliotekanauki.pl/articles/2134109.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: rotary forming
virtual reality
finite element
computer simulation
high performance computing
Opis:: This paper presents an innovative solution in the form of a virtual reality (VR) and high performance computing (HPC) system dedicated to aid designing rotary forming processes with laser beam reheating the material formed. The invented method allowing a virtual machine copy to be coupled with its actual counterpart and a computing engine utilizing GPU processors of graphic NVidia cards to accelerate computing are discussed. The completed experiments and simulations of the 316L stainless steel semi-product spinning process showed that the developed VR-HPC system solution allows the manufacturing process to be effectively engineered and controlled in industrial conditions.
Źródło:: Archives of Metallurgy and Materials; 2022, 67, 3; 1099--1105
1733-3490
Pojawia się w:: Archives of Metallurgy and Materials
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Towards a grid infrastructure for hydro-meteorological research
Zastosowanie infrastruktury gridowej do badań hydrometeorologicznych
Autorzy:: Schiffers, M.
Kranzlmuller, D.
Clematis, A.
D'Agostino, D.
Galizia, A.
Quarati, A.
Parodi, A.
Morando, M.
Rebora, N.
Trasforini, E.
Molini, L.
Siccardi, F.
Craig, G.
Tafferner, A.
Powiązania:: https://bibliotekanauki.pl/articles/305473.pdf
Data publikacji:: 2011
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: obliczenia gridowe
obliczenia wysokiej wydajności
hydrometeorologia
e-science
grid computing
high performance computing
hydrometeorology
Opis:: The Distributed Research Infrastructure for Hydro-Meteorological Study (DRIHMS) is a co-ordinated action co-funded by the European Commission. DRIHMS analyzes the main issues that arise when designing and setting up a pan-European Grid-based e-Infrastructure for research activities in the hydrologic and meteorological fields. The main outcome of the project is represented first by a set of Grid usage patterns to support innovative hydro-meteorological research activities, and second by the implications that such patterns define for a dedicated Grid infrastructure and the respective Grid architecture.
Rozproszona infrastruktura naukowa przeznaczona do badań hydrometeorologicznych (Distributed Research Infrastructure for Hydro-Meteorological Study - DRIHMS) stanowi element skoordynowanej akcji współfinansowanej przez Komisję Europejską. Celem DRIHMS jest analiza głównych problemów spotykanych w dziedzinie hydrologii i meteorologii. Głównym wynikiem projektu będzie zestaw wzorców użytkowania środowisk gridowych w celu wspomagania nowoczesnych badań hydrometeorologicznych oraz wnioski wynikające z powyższego zastosowania, mogące mieć wpływ na dalszy rozwój dedykowanych rozwiązań gridowych.
Źródło:: Computer Science; 2011, 12; 45-62
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Feasibility of FPGA to HPC computation migration of plasma impurities diagnostic algorithms
Autorzy:: Linczuk, P.
Krawczyk, R. D.
Zabolotny, W.
Wojenski, A.
Kolasinski, P.
Pozniak, K. T.
Kasprowicz, G.
Chernyshova, M.
Czarski, T.
Powiązania:: https://bibliotekanauki.pl/articles/226512.pdf
Data publikacji:: 2017
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: plasma diagnostic
GEM system
feedback loops
Intel Xeon
Intel Xeon Phi
high performance computing HPC
Opis:: We present a feasibility study of fast events parameters estimation algorithms regarding their execution time. It is the first stage of procedure used on data gathered from gas electron multiplier (GEM) detector for diagnostic of plasma impurities. Measured execution times are estimates of achievable times for future and more complex algorithms. The work covers usage of Intel Xeon and Intel Xeon Phi - high-performance computing (HPC) devices as a possible replacement for FPGA with highlighted advantages and disadvantages. Results show that less than 10 ms feedback loop can be obtained with the usage of 25% hardware resources in Intel Xeon or 10% resources in Intel Xeon Phi which leaves space for future increase of algorithms complexity. Moreover, this work contains a simplified overview of basic problems in actual measurement systems for diagnostic of plasma impurities, and emerging trends in developed solutions.
Źródło:: International Journal of Electronics and Telecommunications; 2017, 63, 3; 323-328
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Computation acceleration on SGI RASC: FPGA based reconfigurable computing hardware
Akceleracja obliczeń na platformie SGI RASC: module obliczeń za pomocą logiki rekonfigurowalnej
Autorzy:: Jamro, E.
Janiszewski, M.
Machaczek, K.
Russek, P.
Wiatr, K.
Wielgosz, M.
Powiązania:: https://bibliotekanauki.pl/articles/305339.pdf
Data publikacji:: 2008
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: sprzętowa akceleracja obliczeń
procesory dedykowane
FPGA
obliczenia wielkiej skali
SGI RASC
custom computing
single-purpose processors
high performance computing
Opis:: In this paper a novel method of computation using FPGA technology is presented. In several cases this method provides a calculations speedup with respcct to the General Purpose Processors (GPP). The main concept of this approach is based on such a design of computing hardware architecture to fit algorithm dataflow and best utilize well known computing techniques as pipelining and parallelism. Configurable hardware is used as a implementation platform for custom designed hardware. Paper will present implementation results of algorithms those are used in such areas as cryptography, data analysis and scientific computation. The other promising areas of new technology utilization will also be mentioned, bioinformatics for instance. Mentioned algorithms were designed, tested and implemented on SGI RASC platform. RASC module is a part of Cyfronet's SGI Altix 4700 SMP system. We will also present RASC modern architecture. In principle it consists of FPGA chips and very fast, 128-bit wide local memory. Design tools avaliable for designers will also be presented.
Autorzy prezentują nową metodę prowadzenia obliczeń wielkiej skali, opartą na układach FPGA. W szczególnych przypadkach jej zastosowanie prowadzi do skrócenia czasu obliczeń. Podstawą metody jest prowadzenie obliczeń za pomocą architektur obliczeniowych projektowanych dla danego algorytmu. Ponieważ architektura stworzona została specjalnie dla zadanego algorytmu, lepiej wykorzystuje możliwości równoległej i potokowej realizacji obliczeń. Jako platformę realizacji architektur dedykowanych zastosowano układy rekonfigurowalne. Artykuł prezentuje także wyniki zastosowania wspomnianej techniki w takich obszarach, jak kryptografia, analiza danych i obliczenia naukowe podwójnej precyzji. Wskazano również na inne dziedziny nauki, gdzie opisywana technika jest z powodzeniem stosowana (np.: bioinformatyka). Zrealizowane algorytmy były uruchomione i przetestowane na zainstalowanym w ACK Cyfronet AGH module SGI RASC, będącym częścią systemu SMP Al-tix 4700. Przedstawiono architekturę zastosowanego modułu RASC oraz narzędzia i metody projektowania dostępne dla programistów.
Źródło:: Computer Science; 2008, 9; 21-34
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: High-performance simulation-based algorithms for an alpine ski racer’s trajectory optimization in heterogeneous computer systems
Autorzy:: Dębski, R.
Powiązania:: https://bibliotekanauki.pl/articles/330952.pdf
Data publikacji:: 2014
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: trajectory optimization
heterogeneous computing
GPGPU
high performance computing
alpine ski racing
optymalizacja trajektorii
obliczenia heterogeniczne
obliczenia wysokiej wydajności
narciarstwo alpejskie
Opis:: Effective, simulation-based trajectory optimization algorithms adapted to heterogeneous computers are studied with reference to the problem taken from alpine ski racing (the presented solution is probably the most general one published so far). The key idea behind these algorithms is to use a grid-based discretization scheme to transform the continuous optimization problem into a search problem over a specially constructed finite graph, and then to apply dynamic programming to find an approximation of the global solution. In the analyzed example it is the minimum-time ski line, represented as a piecewise-linear function (a method of elimination of unfeasible solutions is proposed). Serial and parallel versions of the basic optimization algorithm are presented in detail (pseudo-code, time and memory complexity). Possible extensions of the basic algorithm are also described. The implementation of these algorithms is based on OpenCL. The included experimental results show that contemporary heterogeneous computers can be treated as μ-HPC platforms—they offer high performance (the best speedup was equal to 128) while remaining energy and cost efficient (which is crucial in embedded systems, e.g., trajectory planners of autonomous robots). The presented algorithms can be applied to many trajectory optimization problems, including those having a black-box represented performance measure.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2014, 24, 3; 551-566
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: A High Performance Computing approach to the simulation of Fluid-Solid Interaction problems with rigid and flexible components
Zastosowanie wysokowydajnej techniki obliczeniowej (HPC) do symulacji problemów interakcji między płynem i ciałem stałym z elementami sztywnymi i elastycznymi
Autorzy:: Pazouki, A
Serban, R
Negrut, D
Powiązania:: https://bibliotekanauki.pl/articles/950680.pdf
Data publikacji:: 2014
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: fluid-solid interaction
high performance computing
smoothed particle hydrodynamics
rigid
obliczenia wysokiej wydajności
fizyka ciała stałego
dynamika ciepła
Opis:: This work outlines a unified multi-threaded, multi-scale High Performance Computing (HPC) approach for the direct numerical simulation of Fluid-Solid Interaction (FSI) problems. The simulation algorithm relies on the extended Smoothed Particle Hydrodynamics (XSPH) method, which approaches the fluid flow in a La-grangian framework consistent with the Lagrangian tracking of the solid phase. A general 3D rigid body dynamics and an Absolute Nodal Coordinate Formulation (ANCF) are implemented to model rigid and flexible multibody dynamics. The two-way coupling of the fluid and solid phases is supported through use of Boundary Condition Enforcing (BCE) markers that capture the fluid-solid coupling forces by enforcing a no-slip boundary condition. The solid-solid short range interaction, which has a crucial impact on the small-scale behavior of fluid-solid mixtures, is resolved via a lubrication force model. The collective system states are integrated in time using an explicit, multi-rate scheme. To alleviate the heavy computational load, the overall algorithm leverages parallel computing on Graphics Processing Unit (GPU) cards. Performance and scaling analysis are provided for simulations scenarios involving one or multiple phases with up to tens of thousands of solid objects. The software implementation of the approach, called Chrono:Fluid, is part of the Chrono project and available as an open-source software.
W pracy przedstawiono zarys jednolitego podejścia do bezpośredniej numerycznej symulacji problemów interakcji płyn – ciało stałe (FSI) z wykorzystaniem wielowątkowej wysokowydajnej techniki obliczeniowej (HPC) o wielkiej skali. Algorytm symulacji opiera się na rozszerzonej metodzie hydrodynamiki cząstek gładkich (XSPH), która opisuje przepływ płynu w formalizmie Lagrange'a zgodnym z metodą Lagrange'a śledzenia fazy stałej. W celu modelowania sztywnego i elastycznego układu wielu ciał implementowano ogólną, trójwymiarową dynamikę ciała sztywnego i zastosowano sformułowanie bezwzględnych współrzędnych węzłowych (ANCF). Dwukierunkowe sprzężenie między płynem i fazą stałą jest zamodelowane przez użycie znaczników wymuszenia warunków brzegowych (BCE) które oddają działanie sił sprzężenia między płynem a ciałem stałym wymuszając brak poślizgu w warunkach brzegowych. Problem interakcji bliskiego zakresu między płynem i ciałem stałym, która ma decydujący wpływ na zachowanie w małej skali mieszanin płynów i ciał stałych, rozwiązano przy pomocy modelu sił smarowania. Stany systemu zbiorczego są integrowane w czasie przy użyciu jawnego, wieloszybkościowego schematu. By zmniejszyć wielkie obciążenie obliczeniowe, w algorytmie ogólnym położono nacisk na obliczenia równoległe w kartach procesorów graficznych (GPU). W pracy przedstawiono analizę wydajności i skalowania dla scenariuszy symulacji obejmujących jedną lub wiele faz przy liczbie obiektów stałych sięgającej dziesiątek tysięcy. Implementacja oprogramowania przedstawionej metody, o nazwie Chrono: Fluid, jest częścią projektu Chrono i jest udostępniona do użytku nieodpłatnego.
Źródło:: Archive of Mechanical Engineering; 2014, LXI, 2; 227-251
0004-0738
Pojawia się w:: Archive of Mechanical Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Performance enhancement of CUDA applications by overlapping data transfer and Kernel execution
Autorzy:: Raju, K.
Chiplunkar, Niranjan N
Powiązania:: https://bibliotekanauki.pl/articles/1956064.pdf
Data publikacji:: 2021
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: CPU-GPU
high-performance computing
kernel
data transfer
CUDA streams
obliczenia wysokiej wydajności
jądro
transfer danych
strumienie CUDA
Opis:: The CPU-GPU combination is a widely used heterogeneous computing system in which the CPU and GPU have different address spaces. Since the GPU cannot directly access the CPU memory, prior to invoking the GPU function the input data must be available on the GPU memory. On completion of GPU function, the results of computation are transferred to CPU memory. The CPU-GPU data transfer happens through PCIExpress bus. The PCI-E bandwidth is much lesser than that of GPU memory. The speed at which the data is transferred is limited by the PCI-E bandwidth. Hence, the PCI-E acts as a performance bottleneck. In this paper two approaches are discussed to minimize the overhead of data transfer, namely, performing the data transfer while the GPU function is being executed and reducing the amount of data to be transferred to GPU. The effectiveness of these approaches on the execution time of a set of CUDA applications is realized using CUDA streams. The results of our experiments show that the execution time of applications can be minimized with the proposed approaches.
Źródło:: Applied Computer Science; 2021, 17, 3; 5-18
1895-3735
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: The parallel tiled WZ factorization algorithm for multicore architectures
Autorzy:: Bylina, Beata
Bylina, Jarosław
Powiązania:: https://bibliotekanauki.pl/articles/331092.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: tiled algorithm
WZ factorization
solution of linear system
Amdahl’s law
high performance computing
multicore architecture
rozkład WZ
układ liniowy
prawo Amdahla
architektura wielordzeniowa
Opis:: The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl’s law.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 2; 407-419
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "High Performance Computing" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język