Temat: akceleracja sprzętowa - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Komunikacja ze sprzętowym akceleratorem haszowania n-gramów dla procesora ARM z wykorzystaniem portu ACP
Communication with an n-gram hashing hardware accelerator for the ARM using ACP
Autorzy:: Barszczowski, M.
Koryciak, S.
Dąbrowska-Boruch, A.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/152364.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: ACP
akceleracja sprzętowa
ARM
Zynq
acceleration
Opis:: Artykuł opisuje uruchomienie portu ACP w układzie EPP firmy Xilinx przy użyciu CDMA zarządzającego transmisją pomiędzy akceleratorem, a rdzeniami procesora. Głównym celem badań było utworzenie modułu dokonującego tak zwanego haszowania zbiorów danych. Do wykonania tej operacji wykorzystany został układ Zynq 7000 posiadający zasoby logiki programowalnej oraz dwa rdzenie ARM A9. Powstały dwie koncepcje realizacji akceleratora. Pierwsza wersja zakładała bezpośredni przepływ danych ze źródła do akceleratora, a następnie do rdzeni ARM. Drugie rozwiązanie zakłada wykorzystanie portu ACP.
This paper introduces a new approach to hardware acceleration using the ACP(Acceleration Coherency Port) in Xilinx Zynq-7000 EPP XC7Z020. The first prototype allocated BRAM memory and transferred data through the ACP. The second one used a hardware hashing module to process data outside the CPU. The module received and returned data through the ACP port. The main task of the system is to replace a set of data with its shorter representative of constant length without interference of the processing unit. The main benefit of hashing data lies within the constant length of function outcome, which leads to data compression. Compression is highly desirable while comparing large subsets of data, especially in data mining. The execution of a hashing function requires high performance of the CPU due to the computational complexity of the algorithm. Two concepts where established. The first one assumed transferring data directly do the hardware accelerator and later to ARM cores. This solution is attractive due to its simplicity and relatively fast. Unfortunately, the data cannot be processed before hashing with the same CPU without significant speed reduction. The second approach used the ACP port which can transfer data very fast between L2/L3 cache memory without flushing of validating cache. The data can be processed by the software driven CPU, sent to the accelerator and then sent back to CPU for further processing. To accomplish the established task, the Zynq 7000 EPP with double ARM A9 core and programmable logic in one chip was used.
Źródło:: Pomiary Automatyka Kontrola; 2014, R. 60, nr 7, 7; 486-488
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Sprzętowa akceleracja wybranych algorytmów kompresji obrazu nieruchomego w standardzie JPEG
Hardware acceleration of image compression algorithms in JPEG standard
Autorzy:: Koryciak, S.
Dąbrowska-Boruch, A.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/156717.pdf
Data publikacji:: 2012
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: akceleracja sprzętowa
JPEG
FPGA
IDCT
Huffman
custom computing
Opis:: Artykuł opisuje opracowanie akceleratora dla wybranych algorytmów kompresji obrazu nieruchomego. Do jego sprzętowej realizacji został wykorzystany język opisu sprzętu VHDL. Wynikiem pracy była skuteczna implementacja na układ programowalny dekompresora obrazów nieruchomych zapisanych w standardzie JPEG ISO/IEC 10918-1(1993), trybie Baseline będącym podstawowym i obowiązkowym trybem dla tego standardu. Szczególną uwagę poświęcono wyborowi i implementacji dwóch najważniejszych zdaniem autora algorytmów występujących w omawianym standardzie.
Image compression is one of the most important topics in the industry, commerce and scientific research. Image compression algorithms need to perform a large number of operations on a large number of data. In the case of compression and decompression of still images the time needed to process a single image is not critical. However, the assumption of this project was to build a solution which would be fully parallel, sequential and synchronous. The paper describes the development of an accelerator for selected still image compression algorithms. In its hardware implementation there was used the hardware description language VHDL. The result of this work was a successful implementation on a programmable system decompressor of still images saved in JPEG standard ISO / IEC 10918-1 (1993), Baseline mode, which is a primary, fundamental, and mandatory mode for this standard. The modular system and method of connection allows the continuous input data stream. Particular attention was paid to selection and implementation of two major, in the authors opinion, algorithms occuring in this standard. Executing the IDCT module uses an algorithm transformation IDCT-SQ modified by the authors of this paper. It provides a full pipelining by applying the same kind of arithmetic operations between each stage. The module used to decode Huffman's code proved to be a bottleneck
Źródło:: Pomiary Automatyka Kontrola; 2012, R. 58, nr 7, 7; 593-595
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Wykorzystanie akceleracji sprzętowej przy implementacji metryk podobieństwa tekstów
The use of a hardware accelerator for implementation of text resemblance metrics
Autorzy:: Iwanecki, Ł.
Koryciak, S.
Dąbrowska-Boruch, A.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/157430.pdf
Data publikacji:: 2014
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: akceleracja sprzętowa
FPGA
ARM
klasyfikacja tekstu
hardware acceleration
text classification
Opis:: Artykuł opisuje badania na temat klasyfikatorów tekstów. Zadanie polegało na zaprojektowaniu akceleratora sprzętowego, który przyspieszyłby proces klasyfikacji tekstów pod względem znaczeniowym. Projekt został podzielony na dwie części. Celem części pierwszej było zaproponowanie sprzętowej implementacji algorytmu realizującego metrykę do obliczania podobieństwa dokumentów. W drugiej części zaprojektowany został cały systemem akceleratora sprzętowego. Kolejnym etapem projektowym jest integracja modelu metryki z system akceleracji.
The aim of this project is to propose a hardware accelerating system to improve the text categorization process. Text categorization is a task of categorizing electronic documents into the predefined groups, based on the content. This process is complex and requires a high performance computing system and a big number of comparisons. In this document, there is suggested a method to improve the text categorization using the FPGA technology. The main disadvantage of common processing systems is that they are single-threaded – it is possible to execute only one instruction per a single time unit. The FPGA technology improves concurrence. In this case, hundreds of big numbers may be compared in one clock cycle. The whole project is divided into two independent parts. Firstly, a hardware model of the required metrics is implemented. There are two useful metrics to compute a distance between two texts. Both of them are shown as equations (1) and (2). These formulas are similar to each other and the only difference is the denominator. This part results in two hardware models of the presented metrics. The main purpose of the second part of the project is to design a hardware accelerating system. The system is based on a Xilinx Zynq device. It consists of a Cortex-A9 ARM processor, a DMA controller and a dedicated IP Core with the accelerator. The block diagram of the system is presented in Fig.4. The DMA controller provides duplex transmission from the DDR3 memory to the accelerating unit omitting a CPU. The project is still in development. The last step is to integrate the hardware metrics model with the accelerating system.
Źródło:: Pomiary Automatyka Kontrola; 2014, R. 60, nr 7, 7; 426-428
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Sprzętowa implementacja funkcji orbitalnej na potrzeby obliczeń kwantowo-chemicznych
Hardware implementation of the atom orbital calculation
Autorzy:: Wielgosz, M.
Jamro, E.
Russek, P.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/154619.pdf
Data publikacji:: 2010
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: akceleracja sprzętowa
komputery dużej mocy (HPC)
FPGA
obliczenia zmiennoprzecinkowe
funkcja exp()
High Performance Reconfigurable Computing
quantum chemistry
custom computing
HPC
Opis:: W niniejszym artykule przedstawione zostały wyniki implementacji modułu obliczającego wartość orbitalu atomowego w punkcie. Moduł ten stanowił cześć składową jednostki generującej wartość potencjału korelacyjno-wymiennego, wykorzystywaną w obliczeniach kwantowo-chemicznych. Prezentowana jednostka składa się z potokowych bloków zmiennoprzecinkowych. W pracy zaprezentowano również wyniki akceleracji obliczeń względem procesora ogólnego przeznaczenia Itanium2 1.6 GHz.
The paper presents FPGA acceleration and implementation results of the orbital function calculation employed in quantum-chemistry. The orbital function core is composed of the authors' customized floating-point hardware modules. These modules are scalable from single to double precision, capable of working at frequency ranging from 100 to 200 MHz. Besides hardware implementation, the design process also involved reformulation of the algorithm in order to adapt them to the platform profile. The computational procedure presented in this paper is part of the algorithm for generating exchange-correlation potential, and is also recognized as one of the most computationally intensive routines. This feature justifies the effort devoted to develop its hardware implementation. The precision of floating-point operations becomes a primary concern when dealing with low-level quantum chemistry procedures, thus the authors have taken various measures to optimize them, both in terms of resource consumption and processing speed.
Źródło:: Pomiary Automatyka Kontrola; 2010, R. 56, nr 7, 7; 705-707
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Real-time implementation of moving object detection in video surveillance systems using FPGA
Implementacja detekcji obiektów ruchomych w czasie rzeczywistym w systemach nadzoru wizyjnego z wykorzystaniem układów FPGA
Autorzy:: Kryjak, T.
Gorgoń, M.
Powiązania:: https://bibliotekanauki.pl/articles/305415.pdf
Data publikacji:: 2011
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: generacja tła
odejmowanie tła
przetwarzanie obrazów
akceleracja sprzętowa
układy FPGA
background generation
background subtraction
image processing
hardware acceleration
FPGA
Opis:: The article presents the concept of real-time implementation computing tasks in video surveillance systems. A pipeline implementation of a multimodal background generation algorithm for colour video stream and a moving objects segmentation based on brightness, colour and textural information in reconfigurable resources of FPGA device is described. System architecture, resource usage and segmentation results are presented.
W artykule zaprezentowano koncepcję implementacji zadań obliczeniowych wykorzystywanych w systemach nadzoru wizyjnego w czasie rzeczywistym. Opisano implementację wielomodalnej metody generacji tła dla sekwencji wideo zarejestrowanych w kolorze oraz segmentację obiektów ruchomych z wykorzystaniem informacji o jasności, kolorze i teksturze w zasobach rekonfigurowalnych układów FPGA. Zaprezentowano architekturę systemu, zużycie zasobów i przykładowe rezultaty segmentacji.
Źródło:: Computer Science; 2011, 12; 149-162
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Akceleracja obliczeń zmiennoprzecinkowych na platformie RASC
Accelerating calculations on the RASC platform
Autorzy:: Wielgosz, M.
Jamro, E.
Wiatr, K.
Powiązania:: https://bibliotekanauki.pl/articles/154331.pdf
Data publikacji:: 2009
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: akceleracja sprzętowa
komputery dużej mocy (HPC)
FPGA
obliczenia zmiennoprzecinkowe
funkcja exp()
HPRC (High Performance Reconfigurable Computing)
elementary functions
exponential function
Opis:: W artykule zostały zaprezentowane wyniki testów przeprowadzonych w celu określenia maksymalnej szybkości wykonywania operacji zmiennoprzecinkowych na platformie rekonfigurowanej RASC. Zaimplementowano różne dostępne tryby konfiguracji jednostki Host oraz RASC w celu wyłonienia najbardziej efektywnego pod względem wydajności trybu pracy jednostki obliczeniowej. Uzyskane wyniki pomiarów ujawniały, że kombinacja Direct I/O oraz DMA zapewnia najwyższą przepustowość pomiędzy węzłami Host i RASC. Niemniej jednak dla niektórych aplikacji tryb multi-buffering może okazać się bardziej odpowiedni, ze względu na możliwość jednoczesnego przesyłania danych i wykonywania operacji. Funkcja exp() w standardzie zmiennoprzecinkowym o podwójnej precyzji została wykorzystana jako przykładowa aplikacja, która pozwoliła oszacowanie możliwej do uzyskania akceleracji obliczeń na platformie RASC.
This paper presents results of the tests performed to determine high speed calculations capabilities of the SGI RASC platform. Different data transfer modes and memory management approaches were examined to choose the most effective combination of the Host and RASC memory adjustments. That work may be regarded as a case study of the contemporary FPGA -based accelerator which, however, can characterize the whole branch of the devices. The paper is strongly focused on the floating point calculations potential of the FPGA accelerator. The RASC algorithm execution procedure, from the processor perspective, is composed of several functions which reserve resources, queue commands and perform other preparation steps. It is noteworthy (Fig. 3) that the time consumed by the functions remains roughly the same, independent of the algorithm being executed. The resource reservation procedure, once conducted, allows many executions of the algorithm -that amounts to huge time savings, since the procedure takes approximately 7.5 ms, which is roughly 99 % of the overall execution time of the algorithm. Rasclib algorithm commit and rasclib algorithm wait calls are considered to be the key (Fig. 3) part of the RASC software execution routine. The first one activates the FPGA between these two commands is the transfer and algorithm execution time. All curves (Fig. 4) reflect overall processing time of the same amount of data, but differ in size of the single data chunk which varies from 1024x64 bit = 8 kB to 1048576x64 bit = 8 MB. It has been observed that for the bigger chunk much better results are achieved in terms of the effective execution time. However, above 1 MB a decrease of the effective execution time seems to indicate saturation, therefore sending data in bigger portions may not improve the performance of the system so much. The most effective execution time of single exp() function for SRAM buffering mode is 12 ns, so 9,5 ns is transport overhead due to bus delays. The theoretical calculation time of single exp() function (data transfer is not taken into account) is 2,5 ns because two exp() are implemented on the RASC and clocked at 200 Mhz. The obtained measurement results show that Direct I/O mode together with DMA transfer provides the highest data throughput between the Host and RASC slice. Nevertheless, for some application multi-buffering can appear to be more suitable in terms of concurrent data transfer capabilities and FPGA algorithm execution. As a hardware acceleration example, there is considered an exponential function which allows estimating maximum achievable data processing speed.
Źródło:: Pomiary Automatyka Kontrola; 2009, R. 55, nr 7, 7; 485-487
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Computation acceleration on SGI RASC: FPGA based reconfigurable computing hardware
Akceleracja obliczeń na platformie SGI RASC: module obliczeń za pomocą logiki rekonfigurowalnej
Autorzy:: Jamro, E.
Janiszewski, M.
Machaczek, K.
Russek, P.
Wiatr, K.
Wielgosz, M.
Powiązania:: https://bibliotekanauki.pl/articles/305339.pdf
Data publikacji:: 2008
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: sprzętowa akceleracja obliczeń
procesory dedykowane
FPGA
obliczenia wielkiej skali
SGI RASC
custom computing
single-purpose processors
high performance computing
Opis:: In this paper a novel method of computation using FPGA technology is presented. In several cases this method provides a calculations speedup with respcct to the General Purpose Processors (GPP). The main concept of this approach is based on such a design of computing hardware architecture to fit algorithm dataflow and best utilize well known computing techniques as pipelining and parallelism. Configurable hardware is used as a implementation platform for custom designed hardware. Paper will present implementation results of algorithms those are used in such areas as cryptography, data analysis and scientific computation. The other promising areas of new technology utilization will also be mentioned, bioinformatics for instance. Mentioned algorithms were designed, tested and implemented on SGI RASC platform. RASC module is a part of Cyfronet's SGI Altix 4700 SMP system. We will also present RASC modern architecture. In principle it consists of FPGA chips and very fast, 128-bit wide local memory. Design tools avaliable for designers will also be presented.
Autorzy prezentują nową metodę prowadzenia obliczeń wielkiej skali, opartą na układach FPGA. W szczególnych przypadkach jej zastosowanie prowadzi do skrócenia czasu obliczeń. Podstawą metody jest prowadzenie obliczeń za pomocą architektur obliczeniowych projektowanych dla danego algorytmu. Ponieważ architektura stworzona została specjalnie dla zadanego algorytmu, lepiej wykorzystuje możliwości równoległej i potokowej realizacji obliczeń. Jako platformę realizacji architektur dedykowanych zastosowano układy rekonfigurowalne. Artykuł prezentuje także wyniki zastosowania wspomnianej techniki w takich obszarach, jak kryptografia, analiza danych i obliczenia naukowe podwójnej precyzji. Wskazano również na inne dziedziny nauki, gdzie opisywana technika jest z powodzeniem stosowana (np.: bioinformatyka). Zrealizowane algorytmy były uruchomione i przetestowane na zainstalowanym w ACK Cyfronet AGH module SGI RASC, będącym częścią systemu SMP Al-tix 4700. Przedstawiono architekturę zastosowanego modułu RASC oraz narzędzia i metody projektowania dostępne dla programistów.
Źródło:: Computer Science; 2008, 9; 21-34
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "akceleracja sprzętowa" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język