Temat: reinforcement learning - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Deep reinforcement learning overview of the state of the art
Autorzy:: Fenjiro, Y.
Benbrahim, H.
Powiązania:: https://bibliotekanauki.pl/articles/384788.pdf
Data publikacji:: 2018
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: reinforcement learning
deep learning
convolutional network
recurrent network
deep reinforcement learning
Opis:: Artificial intelligence has made big steps forward with reinforcement learning (RL) in the last century, and with the advent of deep learning (DL) in the 90s, especially, the breakthrough of convolutional networks in computer vision field. The adoption of DL neural networks in RL, in the first decade of the 21 century, led to an end-toend framework allowing a great advance in human-level agents and autonomous systems, called deep reinforcement learning (DRL). In this paper, we will go through the development Timeline of RL and DL technologies, describing the main improvements made in both fields. Then, we will dive into DRL and have an overview of the state-ofthe- art of this new and promising field, by browsing a set of algorithms (Value optimization, Policy optimization and Actor-Critic), then, giving an outline of current challenges and real-world applications, along with the hardware and frameworks used. In the end, we will discuss some potential research directions in the field of deep RL, for which we have great expectations that will lead to a real human level of intelligence.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2018, 12, 3; 20-39
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: A compact DQN model for mobile agents with collision avoidance
Autorzy:: Kamola, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/27314243.pdf
Data publikacji:: 2023
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: Q‐learning
DQN
reinforcement learning
Opis:: This paper presents a complete simulation and reinforce‐ ment learning solution to train mobile agents’ strategy of route tracking and avoiding mutual collisions. The aim was to achieve such functionality with limited resources, w.r.t. model input and model size itself. The designed models prove to keep agents safely on the track. Colli‐ sion avoidance agent’s skills developed in the course of model training are primitive but rational. Small size of the model allows fast training with limited computational resources.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2023, 17, 2; 28--35
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Prioritized epoch - incremental Q - learning algorithm
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/375619.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: reinforcement learning
Q-learning
grid world
Opis:: The basic reinforcement learning algorithms, such as Q-learning or Sarsa, are characterized by short time-consuming single learning step, however the number of epochs necessary to achieve the optimal policy is not acceptable. There are many methods that reduce the number of' necessary epochs, like TD(lambda greather than 0), Dyna or prioritized sweeping, but their computational time is considerable. This paper proposes a combination of Q-learning algorithm performed in the incremental mode with the method of acceleration executed in the epoch mode. This acceleration is based on the distance to the terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(lambda)-learning, Dyna-Q and prioritized sweeping in the experiments of three grid worlds. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.
Efektywność podstawowych algorytmów uczenia ze wzmocnieniem Q-learning i Sarsa, mierzona liczbą prób niezbędnych do uzyskania strategii optymalnej jest stosunkowo niewielka. Stąd też możliwości praktycznego zastosowania tego algorytmu są niewielkie. Zaletą tych podstawowych algorytmów jest jednak niewielka złożoność obliczeniowa, sprawiająca, że czas wykonania pojedynczego kroku uczenia jest na tyle mały, że znakomicie sprawdzają się one w systemach sterowania online. Stosowane metody przyśpieszania procesu uczenia ze wzmocnieniem, które pozwalająna uzyskanie stanu absorbującego po znacznie mniejszej liczbie prób, niż algorytmy podstawowe powodują najczęściej zwiększenie złożoności obliczeniowej i wydłużenie czasu wykonania pojedynczego kroku uczenia. Najczęściej stosowane przyśpieszanie metodą różnic czasowych TD(lambda znak większości 0) wiąże się z zastosowaniem dodatkowych elementów pamięciowych, jakimi są ślady aktywności (eligibility traces). Czas wykonania pojedynczego kroku uczenia w takim algorytmie znacznie się wydłuża, gdyż w odróżnieniu od algorytmu podstawowego, gdzie aktualizacji podlegała wyłącznie funkcja wartości akcji tylko dla stanu aktywnego, tutaj aktualizację przeprowadza się dla wszystkich stanów. Bardziej wydajne metody przyśpieszania, takie jak Dyna, czy też prioritized sweeping również należą do klasy algorytmów pamięciowych, a ich główną ideą jest uczenie ze wzmocnieniem w oparciu o adaptacyjny model środowiska. Metody te pozwalają na uzyskanie stanu absorbującego w znacznie mniejszej liczbie prób, jednakże, na skutek zwiększonej złożoności obliczeniowej, czas wykonania pojedynczego kroku uczenia jest już istotnym czynnikiem ograniczającym zastosowanie tych metod w systemach o znacznej liczbie stanów. Istotą tych algorytmów jest dokonywanie ustalonej liczby aktualizacji funkcji wartości akcji stanów aktywnych w przeszłości, przy czym w przypadku algorytmu Dyna są to stany losowo wybrane, natomiast w przypadku prioritized sweeping stany uszeregowane wg wielkości błędu aktualizacji. W niniejszym artykule zaproponowano epokowo-inkrementacyjny algorytm uczenia ze wzmocnieniem, którego główną ideą jest połączenie podstawowego, inkrementacyjnego algorytmu uczenia ze wzmocnieniem Q-lerning z algorytmem przyśpieszania wykonywanym epokowo. Zaproponowana metoda uczenia epokowego w głównej mierze opiera się na rzeczywistej wartości sygnału wzmocnienia obserwowanego przy przejściu do stanu absorbującego, który jest następnie wykładniczo propagowany wstecz w zależności od estymowanej odległości od stanu absorbującego. Dzięki takiemu podej- ściu uzyskano niewielki czas uczenia pojedynczego kroku w trybie inkrementacyjnym (Tab. 2) przy zachowaniu efektywności typowej dla algorytmów Dyna, czy też prioritized sweeping (Tab. 1 i Fig. 5).
Źródło:: Theoretical and Applied Informatics; 2012, 24, 2; 159-171
1896-5334
Pojawia się w:: Theoretical and Applied Informatics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Epokowo-inkrementacyjny algorytm uczenia się ze wzmocnieniem wykorzystujący kryterium średniego wzmocnienia
The epoch-incremental reinforcement learning algorithm based on the average reward
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/152882.pdf
Data publikacji:: 2013
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: uczenie się ze wzmocnieniem
R-learning
algorytm epokowo-inkrementacyjny
average reward reinforcement learning
epoch-incremental reinforcement learning
Opis:: W artykule zaproponowano nowy, epokowo – inkrementacyjny algorytm uczenia się ze wzmocnieniem. Główną ideą tego algorytmu jest przeprowadzenie w trybie epokowym dodatkowych aktualizacji strategii w oparciu o odległości aktywnych w przeszłości stanów od stanu terminalnego. Zaproponowany algorytm oraz algorytmy R(0)-learning, R(λ)-learning, Dyna-R oraz prioritized sweeping-R zastosowano do sterowania modelem samochodu górskiego oraz modelem kulki umieszczonej na balansującej belce.
The application of the average reward reinforcement learning algorithms in the control were described in this paper. Moreover, new epoch-incremental reinforcement learning algorithm (EIR(0)-learning for short) was proposed. In this algorithm, the basic R(0)-learning algorithm was implemented in the incremental mode and the environment model was created. In the epoch mode, on the basis of the model, the distances of past active states to the terminal state were determined. These distances were then used in the update strategy. The proposed algorithm was applied to mountain car (Fig. 4) and ball-beam (Fig. 5) models. The proposed EIR(0)-learning was empirically compared to R(0)-learning [4, 6], R(λ)-learning and model based algorithms: Dyna-R and prioritized sweeping-R [11]. In the case of ball-beam system, EIR(0)-learning algorithm reached the stable control strategy after the smallest number of trials (Tab. 1, column 2). For the mountain car system, the number of trials was smaller than in the case of R(0)-learning and R(λ)-learning algorithms, but greater than for Dyna-R and prioritized sweeping-R. It is worth to pay attention to the fact that the execution times of Dyna-R and prioritized sweeping-R algorithms in the incremental mode were respectively 5 and 50 times longer than for proposed EIR(0)-learning algorithm (Tab. 2, column 3). The main conclusion of this work is that the epoch – incremental learning algorithm provided the stable control strategy in relatively small number of trials and in short time of single iteration.
Źródło:: Pomiary Automatyka Kontrola; 2013, R. 59, nr 7, 7; 700-703
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Use of Modified Adaptive Heuristic Critic Algorithm for Novel Scheduling Mechanism in Packet-Switched Networks
Autorzy:: Jednoralski, M.
Kacprzak, T.
Powiązania:: https://bibliotekanauki.pl/articles/92909.pdf
Data publikacji:: 2005
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: reinforcement learning
telecommunication networks
packet scheduling
Opis:: In this paper a novel scheduling algorithm of packet selection in a switch node for transmission in a network channel, based on Reinforcement Learning and modified Adaptive Heuristic Critic is introduced. A comparison of two well known scheduling algorithms: Earliest Deadline First and Round Robin shows that these algorithms perform well in some cases, but they cannot adapt their behavior to traffic changes. Simulation studies show that novel scheduling algorithm outperforms Round Robin and Earliest Deadline First by adapting to changing of network conditions.
Źródło:: Studia Informatica : systems and information technology; 2005, 2(6); 21-34
1731-2264
Pojawia się w:: Studia Informatica : systems and information technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Adaptive controller design for electric drive with variable parameters by Reinforcement Learning method
Autorzy:: Pajchrowski, T.
Siwek, P.
Wójcik, A.
Powiązania:: https://bibliotekanauki.pl/articles/201068.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: Reinforcement Learning
adaptive control
electric drive
machine learning
Opis:: The paper presents a method for designing a neural speed controller with use of Reinforcement Learning method. The controlled object is an electric drive with a synchronous motor with permanent magnets, having a complex mechanical structure and changeable parameters. Several research cases of the control system with a neural controller are presented, focusing on the change of object parameters. Also, the influence of the system critic behaviour is researched, where the critic is a function of control error and energy cost. It ensures long term performance stability without the need of switching off the adaptation algorithm. Numerous simulation tests were carried out and confirmed on a real stand.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2020, 68, 5; 1019-1030
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Multi agent deep learning with cooperative communication
Autorzy:: Simões, David
Lau, Nuno
Reis, Luís Paulo
Powiązania:: https://bibliotekanauki.pl/articles/1837537.pdf
Data publikacji:: 2020
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: multi-agent systems
deep reinforcement learning
centralized learning
Opis:: We consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partiallyobservable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2020, 10, 3; 189-207
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Discrete uncertainty quantification for offline reinforcement learning
Autorzy:: Pérez Torres, Jose Luis
Corrochano Jiménez, Javier
García, Javier
Majadas, Rubén
Ibañez-Llano, Cristina
Pérez, Sergio
Fernández, Fernando
Powiązania:: https://bibliotekanauki.pl/articles/23944835.pdf
Data publikacji:: 2023
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: off-line reinforcement learning
uncertainty quantification
machine learning
Opis:: In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent with the environment is impractical, either because such interaction is expensive or dangerous. In these cases, previous gathered data can be used, arising what is typically called Offline RL. However, this type of learning faces a large number of challenges, mostly derived from the fact that exploration/exploitation trade-off is overshadowed. In addition, the historical data is usually biased by the way it was obtained, typically, a sub-optimal controller, producing a distributional shift from historical data and the one required to learn the optimal policy. In this paper, we present a novel approach to deal with the uncertainty risen by the absence or sparse presence of some state-action pairs in the learning data. Our approach is based on shaping the reward perceived from the environment to ensure the task is solved. We present the approach and show that combining it with classic online RL methods make them perform as good as state of the art Offline RL algorithms such as CQL and BCQ. Finally, we show that using our method on top of established offline learning algorithms can improve them.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2023, 13, 4; 273--287
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Reinforcement-Based Learning for Process Classification Task
Autorzy:: Bashir, Lubna Zaghlul
Powiązania:: https://bibliotekanauki.pl/articles/1192874.pdf
Data publikacji:: 2016
Wydawca:: Przedsiębiorstwo Wydawnictw Naukowych Darwin / Scientific Publishing House DARWIN
Tematy:: Reinforcement Learning
Reward
Classification
Bucket Brigade Algorithm
Opis:: In this work, we present a reinforcement-based learning algorithm that includes the automatic classification of both sensors and actions. The classification process is prior to any application of reinforcement learning. If categories are not at the adequate abstraction level, the problem could be not learnable. The classification process is usually done by the programmer and is not considered as part of the learning process. However, in complex tasks, environments, or agents, this manual process could become extremely difficult. To solve this inconvenience, we propose to include the classification into the learning process. We apply an algorithm to automatically learn to achieve a task through reinforcement learning that works without needing a previous classification process. The system is called Fish or Ship (FOS) assigned the task of inducing classification rules for classification task described in terms of 6 attributes. The task is to categorize an object that has one or more of the following features: Sail, Solid, Big, Swim, Eye, Fins into one of the following: fish, or ship. First results of the application of this algorithm are shown Reinforcement learning techniques were used to implement classification task with interesting properties such as provides guidance to the system and shortening the number of cycles required to learn.
Źródło:: World Scientific News; 2016, 36; 12-26
2392-2192
Pojawia się w:: World Scientific News
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Anapplication of decision rules in reinforcement learning
Autorzy:: Michalski, A.
Powiązania:: https://bibliotekanauki.pl/articles/206534.pdf
Data publikacji:: 2000
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: decision rules
Q-learning
reinforcement learning
rough set theory
Opis:: In this paper an application of decision rules to function representation in reinforcement learning is described. Rules are generated incrementally by method based on rough set theory from instances recorded in state-action-Q-value memory. Simulation experiment investigating the performance of the system and results achieved are reported.
Źródło:: Control and Cybernetics; 2000, 29, 4; 989-996
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Millimeter Wave Beamforming Training : A Reinforcement Learning Approach
Autorzy:: Mohamed, Ehab Mahmoud
Powiązania:: https://bibliotekanauki.pl/articles/1844599.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: millimeter wave
beamforming training
multiarmed bandit
reinforcement learning
Opis:: Beamforming training (BT) is considered as an essential process to accomplish the communications in the millimeter wave (mmWave) band, i.e., 30 ~ 300 GHz. This process aims to find out the best transmit/receive antenna beams to compensate the impairments of the mmWave channel and successfully establish the mmWave link. Typically, the mmWave BT process is highly-time consuming affecting the overall throughput and energy consumption of the mmWave link establishment. In this paper, a machine learning (ML) approach, specifically reinforcement learning (RL), is utilized for enabling the mmWave BT process by modeling it as a multi-armed bandit (MAB) problem with the aim of maximizing the long-term throughput of the constructed mmWave link. Based on this formulation, MAB algorithms such as upper confidence bound (UCB), Thompson sampling (TS), epsilon-greedy (e-greedy), are utilized to address the problem and accomplish the mmWave BT process. Numerical simulations confirm the superior performance of the proposed MAB approach over the existing mmWave BT techniques.
Źródło:: International Journal of Electronics and Telecommunications; 2021, 67, 1; 95-102
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Learning board evaluation function for Othello by hybridizing coevolution with temporal difference learning
Autorzy:: Szubert, M.
Jaśkowski, W.
Krawiec, K.
Powiązania:: https://bibliotekanauki.pl/articles/206175.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: evolutionary computation
coevolutionary algorithms
reinforcement learning
memetic computing
game strategy learning
Opis:: Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms combining evolutionary algorithms and local search, benefit from complementarity of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL superiority compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. To further exploit CTDL potential, we extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to improve search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further study.
Źródło:: Control and Cybernetics; 2011, 40, 3; 805-831
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Analiza możliwości wykorzystania algorytmów uczenia maszynowego w środowisku Unity
Analysis of the possibilities for using machine learning algorithms in the Unity environment
Autorzy:: Litwynenko, Karina
Plechawska-Wójcik, Małgorzata
Powiązania:: https://bibliotekanauki.pl/articles/1837823.pdf
Data publikacji:: 2021
Wydawca:: Politechnika Lubelska. Instytut Informatyki
Tematy:: uczenie ze wzmocnieniem
uczenie przez naśladowanie
Unity
reinforcement learning
imitation learning
Opis:: Algorytmy uczenia ze wzmocnieniem zyskują coraz większą popularność, a ich rozwój jest możliwy dzięki istnieniu narzędzi umożliwiających ich badanie. Niniejszy artykuł dotyczy możliwości zastosowania algorytmów uczenia maszynowego na platformie Unity wykorzystującej bibliotekę Unity ML-Agents Toolkit. Celem badania było porównanie dwóch algorytmów: Proximal Policy Optimization oraz Soft Actor-Critic. Zweryfikowano również możliwość poprawy wyników uczenia poprzez łączenie tych algorytmów z metodą uczenia przez naśladowanie Generative Adversarial Imitation Learning. Wyniki badania wykazały, że algorytm PPO może sprawdzić się lepiej w nieskomplikowanych środowiskach o nienatychmiastowym charakterze nagród, zaś dodatkowe zastosowanie GAIL może wpłynąć na poprawę skuteczności uczenia.
Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.
Źródło:: Journal of Computer Sciences Institute; 2021, 20; 197-204
2544-0764
Pojawia się w:: Journal of Computer Sciences Institute
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Self-improving Q-learning based controller for a class of dynamical processes
Autorzy:: Musial, Jakub
Stebel, Krzysztof
Czeczot, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/1845515.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: process control
Q-learning algorithm
reinforcement learning
intelligent control
on-line learning
Opis:: This paper presents how Q-learning algorithm can be applied as a general-purpose self-improving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Źródło:: Archives of Control Sciences; 2021, 31, 3; 527-551
1230-2384
Pojawia się w:: Archives of Control Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Self-improving Q-learning based controller for a class of dynamical processes
Autorzy:: Musial, Jakub
Stebel, Krzysztof
Czeczot, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/1845530.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: process control
Q-learning algorithm
reinforcement learning
intelligent control
on-line learning
Opis:: This paper presents how Q-learning algorithm can be applied as a general-purpose selfimproving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Źródło:: Archives of Control Sciences; 2021, 31, 3; 527-551
1230-2384
Pojawia się w:: Archives of Control Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 16.

Tytuł:: Handling realistic noise in multi-agent systems with self-supervised learning and curiosity
Autorzy:: Szemenyei, Marton
Reizinger, Patrik
Powiązania:: https://bibliotekanauki.pl/articles/2147129.pdf
Data publikacji:: 2022
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: deep reinforcement learning
multi-agent environment
autonomous driving
robot soccer
self-supervised learning
Opis:: Most reinforcement learning benchmarks – especially in multi-agent tasks – do not go beyond observations with simple noise; nonetheless, real scenarios induce more elaborate vision pipeline failures: false sightings, misclassifications or occlusion. In this work, we propose a lightweight, 2D environment for robot soccer and autonomous driving that can emulate the above discrepancies. Besides establishing a benchmark for accessible multiagent reinforcement learning research, our work addresses the challenges the simulator imposes. For handling realistic noise, we use self-supervised learning to enhance scene reconstruction and extend curiosity-driven learning to model longer horizons. Our extensive experiments show that the proposed methods achieve state-of-the-art performance, compared against actor-critic methods, ICM, and PPO.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2022, 12, 2; 135--148
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 17.

Tytuł:: Stabilizer design of PSS3B based on the KH algorithm and Q-Learning for damping of low frequency oscillations in a single-machine power system
Autorzy:: Mohamadi, Farshid
Sedaghati, Alireza
Powiązania:: https://bibliotekanauki.pl/articles/41190034.pdf
Data publikacji:: 2023
Wydawca:: Politechnika Warszawska, Instytut Techniki Cieplnej
Tematy:: 3-band power system stabilize
reinforcement learning
Q-learning
system zasilania
uczenie przez wzmacnianie
Opis:: The aim of this study is to use the reinforcement learning method in order to generate a complementary signal for enhancing the performance of the system stabilizer. The reinforcement learning is one of the important branches of machine learning on the area of artificial intelligence and a general approach for solving the Marcov Decision Process (MDP) problems. In this paper, a reinforcement learning-based control method, named Q-learning, is presented and used to improve the performance of a 3-Band Power System Stabilizer (PSS3B) in a single-machine power system. For this end, we first set the parameters of the 3-band power system stabilizer by optimizing the eigenvalue-based objective function using the new optimization KH algorithm, and then its efficiency is improved using the proposed reinforcement learning algorithm based on the Q-learning method in real time. One of the fundamental features of the proposed reinforcement learning-based stabilizer is its simplicity and independence on the system model and changes in the working points of operation. To evaluate the efficiency of the proposed reinforcement learning-based 3-band power system stabilizer, its results are compared with the conventional power system stabilizer and the 3-band power system stabilizer designed by the use of the KH algorithm under different working points. The simulation results based on the performance indicators show that the power system stabilizer proposed in this study underperform the two other methods in terms of decrease in settling time and damping of low frequency oscillations.
Źródło:: Journal of Power Technologies; 2023, 103, 4; 230-242
1425-1353
Pojawia się w:: Journal of Power Technologies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 18.

Tytuł:: Markov Decision Process based Model for Performance Analysis an Intrusion Detection System in IoT Networks
Autorzy:: Kalnoor, Gauri
Gowrishankar, -
Powiązania:: https://bibliotekanauki.pl/articles/1839336.pdf
Data publikacji:: 2021
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: DDoS
intrusion detection
IoT
machine learning
Markov decision process
MDP
Q-learning
NSL-KDD
reinforcement learning
Opis:: In this paper, a new reinforcement learning intrusion detection system is developed for IoT networks incorporated with WSNs. A research is carried out and the proposed model RL-IDS plot is shown, where the detection rate is improved. The outcome shows a decrease in false alarm rates and is compared with the current methodologies. Computational analysis is performed, and then the results are compared with the current methodologies, i.e. distributed denial of service (DDoS) attack. The performance of the network is estimated based on security and other metrics.
Źródło:: Journal of Telecommunications and Information Technology; 2021, 3; 42-49
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 19.

Tytuł:: A strategy learning model for autonomous agents based on classification
Autorzy:: Śnieżyński, B.
Powiązania:: https://bibliotekanauki.pl/articles/330672.pdf
Data publikacji:: 2015
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: autonomous agents
strategy learning
supervised learning
classification
reinforcement learning
czynnik niezależny
uczenie nadzorowane
uczenie ze wzmocnieniem
Opis:: In this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer–pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2015, 25, 3; 471-482
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 20.

Tytuł:: Optimal control of dynamic systems using a new adjoining cell mapping method with reinforcement learning
Autorzy:: Arribas-Navarro, T.
Prieto, S.
Plaza, M.
Powiązania:: https://bibliotekanauki.pl/articles/205725.pdf
Data publikacji:: 2015
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: optimal control
cells mapping
state space
reinforcement learning
stability
nonlinear control
controllability
Opis:: This work aims to improve and simplify the procedure used in the Control Adjoining Cell Mapping with Reinforcement Learning (CACM-RL) technique, for the tuning process of an optimal contro ller during the pre-learning stage (controller design), making easier the transition from a simulation environment to the real world. Common problems, encountered when working with CACM-RL, are the adjustment of the cell size and the long-term evolution error. In this sense, the main goal of the new approach, developed for CACM-RL that is proposed in this work (CACMRL*), is to give a response to both problems for helping engineers in deﬁning of the control solution with accuracy and stability criteria instead of cell sizes. The new approach improves the mathematical analysis techniques and reduces the engineering eﬀort during the design phase. In order to demonstrate the behaviour of CACM-RL*, three examples are described to show its application to real problems. In All the examples, CACM-RL* improves with respect to the considered alternatives. In some cases, CACM- RL* improves the average controllability by up to 100%.
Źródło:: Control and Cybernetics; 2015, 44, 3; 369-387
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 21.

Tytuł:: Epoch-incremental reinforcement learning algorithms
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/330530.pdf
Data publikacji:: 2013
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: reinforcement learning
epoch incremental algorithm
grid world
uczenie ze wzmocnieniem
algorytm przyrostowy
Opis:: In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2013, 23, 3; 623-635
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 22.

Tytuł:: Uczenie ze wzmocnieniem regulatora Takagi-Sugeno metodą elementów ASE/ACE
Reinforcement learning with use of neuronlike elements ASE/ACE of Takagi-Sugeno controller
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/156302.pdf
Data publikacji:: 2005
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: regulator rozmyty
uczenie ze wzmocnieniem
wahadło odwrócone
fuzzy controller
reinforcement learning
inverted pendulum
Opis:: W artykule opisano zastosowanie algorytmu uczenia ze wzmocnieniem metodą elementów ASE/ACE do uczenia następników reguł regulatora rozmytego Takagi-Sugeno. Poprawność proponowanych rozwiązań zweryfikowano symulacyjnie w sterowaniu układem wahadło odwrócone - wózek. Przeprowadzono również eksperymenty porównawcze z klasyczną siecią elementów ASE/ACE. Pokazano zalety i wady rozwiązania klasycznego i rozmytego.
The adaptation of reinforcement learning algorithm with the use of ASE/ACE elements for rule consequence learning of the Takagi-Sugeno fuzzy logic controller is proposed. The solution is applied to control of the cart-pole system and tested by computer simulations. The original neuronlike elements ASE/ACE are simulated as well. Advantages and disadvantages of the both approaches (fuzzy and classical) are demonstrated.
Źródło:: Pomiary Automatyka Kontrola; 2005, R. 51, nr 1, 1; 47-49
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 23.

Tytuł:: Self-learning controller of active magnetic bearing based on CARLA method
Samo uczący sie sterownik aktywnego łozyslka magnetycznego oparty na metodzie CARLA
Autorzy:: Brezina, T.
Turek, M.
Pulchart, J.
Powiązania:: https://bibliotekanauki.pl/articles/152983.pdf
Data publikacji:: 2007
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: sterowanie aktywnego łożyska magnetycznego
active magnetic bearing control
continuous action reinforcement learning automata
Opis:: The active magnetic bearing control through analytically designed linear PD regulator, with parallel nonlinear compensation represented by automatic approximator is described in this contribution. Coefficient (parameter) values come from actions of Continuous Action Reinforcement Learning Automata (CARLAs). Influence of CARLAs parameters to learning is discussed. Parameters influence is proved by simulation study. It is shown that learning improvement can be reached by selecting appropriate parameters of learning.
W artykule przedstawiono sterowanie aktywnego łożyska magnetycznego za pomocą analitycznie dobranego regulatora PD z nieliniową kompensacją równoległą. Współczynniki kompensacji są wyznaczane automatycznie z użyciem metody CARLA (Continuous Action Reinforcement Automata). Zbadano wpływ parametrów metody na proces uczenia się kompensatora w oparciu o eksperymenty symulacyjne. Wykazano, że właściwy dobór parametrów metody prowadzi do poprawienia skuteczności procesu uczenia się.
Źródło:: Pomiary Automatyka Kontrola; 2007, R. 53, nr 1, 1; 6-9
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 24.

Tytuł:: Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation
Autorzy:: Rak, A.
Gierusz, W.
Powiązania:: https://bibliotekanauki.pl/articles/259073.pdf
Data publikacji:: 2012
Wydawca:: Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:: ship motion control
trajectory generation
autonomous navigation
reinforcement learning
least-squares policy iteration
Opis:: This paper presents the application of the reinforcement learning algorithms to the task of autonomous determination of the ship trajectory during thein-harbour and harbour approaching manoeuvres. Authors used Markov decision processes formalism to build up the background of algorithm presentation. Two versions of RL algorithms were tested in the simulations: discrete (Q-learning) and continuous form (Least-Squares Policy Iteration). The results show that in both cases ship trajectory can be found. However discrete Q-learning algorithm suffered from many limitations (mainly curse of dimensionality) and practically is not applicable to the examined task. On the other hand, LSPI gave promising results. To be fully operational, proposed solution should be extended by taking into account ship heading and velocity and coupling with advanced multi-variable controller.
Źródło:: Polish Maritime Research; 2012, S 1; 31-36
1233-2585
Pojawia się w:: Polish Maritime Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 25.

Tytuł:: Sumienie maszyny? Sztuczna inteligencja i problem odpowiedzialności moralnej
The Conscience of a Machine? Artificial Intelligence and the Problem of Moral Responsibility
Autorzy:: Wieczorek, Krzysztof Tomasz
Jędrzejko, Paweł
Powiązania:: https://bibliotekanauki.pl/articles/1912551.pdf
Data publikacji:: 2021-09-03
Wydawca:: Wydawnictwo Uniwersytetu Śląskiego
Tematy:: sztuczna inteligencja
etyka
reinforcement learning
autonomia decyzyjna
artificial intelligence
ethics
decision-making autonomy
Opis:: Przyspieszający postęp w dziedzinie inteligentnych technologii rodzi nowe wyzwania etyczne, z którymi w dłuższej lub krótszej perspektywie ludzkość będzie musiała się zmierzyć. Nieuniknionym elementem owego postępu jest rosnąca autonomia w zakresie podejmowania decyzji przez maszyny i systemy, nienadzorowane bezpośrednio przez człowieka. Co najmniej niektóre z tych decyzji będą rodzić konflikty i dylematy moralne. Już dziś warto się zastanowić nad tym, jakie środki są niezbędne, by przyszłe autonomiczne, samouczące i samoreplikujące się obiekty, wyposażone w sztuczną inteligencję i zdolne do samodzielnego działania w dużym zakresie zmienności warunków zewnętrznych, wyposażyć w specyficzny rodzaj inteligencji etycznej. Problem, z którym muszą się zmierzyć zarówno konstruktorzy, jak i użytkownicy tworów obdarzonych sztuczną inteligencją, polega na konieczności optymalnego wyważenia racji, potrzeb i interesów między obiema stronami ludzko-nieludzkiej interakcji. W sytuacji rosnącej autonomii maszyn przestaje bowiem wystarczać etyka antropocentryczna. Potrzebny jest nowy, poszerzony i zmodyfikowany model etyki, który pozwoli przewidzieć i objąć swoim zakresem dotychczas niewystępujący obszar równorzędnych relacji człowieka i maszyny. Niektórym aspektom tego zagadnienia poświęcony jest niniejszy artykuł.
The ever-accelerating progress in the area of smart technologies gives rise to new ethical challenges, which humankind will sooner or later have to face. An inevitable component of this progress is the increase in the autonomy of the decision-making processes carried out by machines and systems functioning without direct human control. At least some of these decisions will generate conflicts and moral dilemmas. It is therefore worth the while to reflect today upon the measures that need to be taken in order to endow the autonomous, self-learning and self-replicating entities – products equipped with artificial intelligence and capable of independent operation in a wide variety of external conditions and circumstances – with a unique kind of ethical intelligence. At the core of the problem, which both the designers and the users of entities bestowed with artificial intelligence must eventually face, lies the question of how to attain the optimal balance between the goals, needs and interests of both sides of the human-non-human interaction. It is so, because in the context of the expansion of the autonomy of the machines, the anthropocentric model of ethics does no longer suffice. It is therefore necessary to develop a new, extended and modified, model of ethics: a model which would encompass the whole, thus far non-existent, area of equal relations between the human and the machine, and which would allow one to predict its dynamics. The present article addresses some of the aspects of this claim.
Źródło:: ER(R)GO: Teoria – Literatura – Kultura; 2021, 42; 15-34
1508-6305
2544-3186
Pojawia się w:: ER(R)GO: Teoria – Literatura – Kultura
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "reinforcement learning" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język