Temat: reinforcement learning - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Deep reinforcement learning overview of the state of the art
Autorzy:: Fenjiro, Y.
Benbrahim, H.
Powiązania:: https://bibliotekanauki.pl/articles/384788.pdf
Data publikacji:: 2018
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: reinforcement learning
deep learning
convolutional network
recurrent network
deep reinforcement learning
Opis:: Artificial intelligence has made big steps forward with reinforcement learning (RL) in the last century, and with the advent of deep learning (DL) in the 90s, especially, the breakthrough of convolutional networks in computer vision field. The adoption of DL neural networks in RL, in the first decade of the 21 century, led to an end-toend framework allowing a great advance in human-level agents and autonomous systems, called deep reinforcement learning (DRL). In this paper, we will go through the development Timeline of RL and DL technologies, describing the main improvements made in both fields. Then, we will dive into DRL and have an overview of the state-ofthe- art of this new and promising field, by browsing a set of algorithms (Value optimization, Policy optimization and Actor-Critic), then, giving an outline of current challenges and real-world applications, along with the hardware and frameworks used. In the end, we will discuss some potential research directions in the field of deep RL, for which we have great expectations that will lead to a real human level of intelligence.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2018, 12, 3; 20-39
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: A compact DQN model for mobile agents with collision avoidance
Autorzy:: Kamola, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/27314243.pdf
Data publikacji:: 2023
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: Q‐learning
DQN
reinforcement learning
Opis:: This paper presents a complete simulation and reinforce‐ ment learning solution to train mobile agents’ strategy of route tracking and avoiding mutual collisions. The aim was to achieve such functionality with limited resources, w.r.t. model input and model size itself. The designed models prove to keep agents safely on the track. Colli‐ sion avoidance agent’s skills developed in the course of model training are primitive but rational. Small size of the model allows fast training with limited computational resources.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2023, 17, 2; 28--35
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Prioritized epoch - incremental Q - learning algorithm
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/375619.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: reinforcement learning
Q-learning
grid world
Opis:: The basic reinforcement learning algorithms, such as Q-learning or Sarsa, are characterized by short time-consuming single learning step, however the number of epochs necessary to achieve the optimal policy is not acceptable. There are many methods that reduce the number of' necessary epochs, like TD(lambda greather than 0), Dyna or prioritized sweeping, but their computational time is considerable. This paper proposes a combination of Q-learning algorithm performed in the incremental mode with the method of acceleration executed in the epoch mode. This acceleration is based on the distance to the terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(lambda)-learning, Dyna-Q and prioritized sweeping in the experiments of three grid worlds. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.
Efektywność podstawowych algorytmów uczenia ze wzmocnieniem Q-learning i Sarsa, mierzona liczbą prób niezbędnych do uzyskania strategii optymalnej jest stosunkowo niewielka. Stąd też możliwości praktycznego zastosowania tego algorytmu są niewielkie. Zaletą tych podstawowych algorytmów jest jednak niewielka złożoność obliczeniowa, sprawiająca, że czas wykonania pojedynczego kroku uczenia jest na tyle mały, że znakomicie sprawdzają się one w systemach sterowania online. Stosowane metody przyśpieszania procesu uczenia ze wzmocnieniem, które pozwalająna uzyskanie stanu absorbującego po znacznie mniejszej liczbie prób, niż algorytmy podstawowe powodują najczęściej zwiększenie złożoności obliczeniowej i wydłużenie czasu wykonania pojedynczego kroku uczenia. Najczęściej stosowane przyśpieszanie metodą różnic czasowych TD(lambda znak większości 0) wiąże się z zastosowaniem dodatkowych elementów pamięciowych, jakimi są ślady aktywności (eligibility traces). Czas wykonania pojedynczego kroku uczenia w takim algorytmie znacznie się wydłuża, gdyż w odróżnieniu od algorytmu podstawowego, gdzie aktualizacji podlegała wyłącznie funkcja wartości akcji tylko dla stanu aktywnego, tutaj aktualizację przeprowadza się dla wszystkich stanów. Bardziej wydajne metody przyśpieszania, takie jak Dyna, czy też prioritized sweeping również należą do klasy algorytmów pamięciowych, a ich główną ideą jest uczenie ze wzmocnieniem w oparciu o adaptacyjny model środowiska. Metody te pozwalają na uzyskanie stanu absorbującego w znacznie mniejszej liczbie prób, jednakże, na skutek zwiększonej złożoności obliczeniowej, czas wykonania pojedynczego kroku uczenia jest już istotnym czynnikiem ograniczającym zastosowanie tych metod w systemach o znacznej liczbie stanów. Istotą tych algorytmów jest dokonywanie ustalonej liczby aktualizacji funkcji wartości akcji stanów aktywnych w przeszłości, przy czym w przypadku algorytmu Dyna są to stany losowo wybrane, natomiast w przypadku prioritized sweeping stany uszeregowane wg wielkości błędu aktualizacji. W niniejszym artykule zaproponowano epokowo-inkrementacyjny algorytm uczenia ze wzmocnieniem, którego główną ideą jest połączenie podstawowego, inkrementacyjnego algorytmu uczenia ze wzmocnieniem Q-lerning z algorytmem przyśpieszania wykonywanym epokowo. Zaproponowana metoda uczenia epokowego w głównej mierze opiera się na rzeczywistej wartości sygnału wzmocnienia obserwowanego przy przejściu do stanu absorbującego, który jest następnie wykładniczo propagowany wstecz w zależności od estymowanej odległości od stanu absorbującego. Dzięki takiemu podej- ściu uzyskano niewielki czas uczenia pojedynczego kroku w trybie inkrementacyjnym (Tab. 2) przy zachowaniu efektywności typowej dla algorytmów Dyna, czy też prioritized sweeping (Tab. 1 i Fig. 5).
Źródło:: Theoretical and Applied Informatics; 2012, 24, 2; 159-171
1896-5334
Pojawia się w:: Theoretical and Applied Informatics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Use of Modified Adaptive Heuristic Critic Algorithm for Novel Scheduling Mechanism in Packet-Switched Networks
Autorzy:: Jednoralski, M.
Kacprzak, T.
Powiązania:: https://bibliotekanauki.pl/articles/92909.pdf
Data publikacji:: 2005
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: reinforcement learning
telecommunication networks
packet scheduling
Opis:: In this paper a novel scheduling algorithm of packet selection in a switch node for transmission in a network channel, based on Reinforcement Learning and modified Adaptive Heuristic Critic is introduced. A comparison of two well known scheduling algorithms: Earliest Deadline First and Round Robin shows that these algorithms perform well in some cases, but they cannot adapt their behavior to traffic changes. Simulation studies show that novel scheduling algorithm outperforms Round Robin and Earliest Deadline First by adapting to changing of network conditions.
Źródło:: Studia Informatica : systems and information technology; 2005, 2(6); 21-34
1731-2264
Pojawia się w:: Studia Informatica : systems and information technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Epokowo-inkrementacyjny algorytm uczenia się ze wzmocnieniem wykorzystujący kryterium średniego wzmocnienia
The epoch-incremental reinforcement learning algorithm based on the average reward
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/152882.pdf
Data publikacji:: 2013
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: uczenie się ze wzmocnieniem
R-learning
algorytm epokowo-inkrementacyjny
average reward reinforcement learning
epoch-incremental reinforcement learning
Opis:: W artykule zaproponowano nowy, epokowo – inkrementacyjny algorytm uczenia się ze wzmocnieniem. Główną ideą tego algorytmu jest przeprowadzenie w trybie epokowym dodatkowych aktualizacji strategii w oparciu o odległości aktywnych w przeszłości stanów od stanu terminalnego. Zaproponowany algorytm oraz algorytmy R(0)-learning, R(λ)-learning, Dyna-R oraz prioritized sweeping-R zastosowano do sterowania modelem samochodu górskiego oraz modelem kulki umieszczonej na balansującej belce.
The application of the average reward reinforcement learning algorithms in the control were described in this paper. Moreover, new epoch-incremental reinforcement learning algorithm (EIR(0)-learning for short) was proposed. In this algorithm, the basic R(0)-learning algorithm was implemented in the incremental mode and the environment model was created. In the epoch mode, on the basis of the model, the distances of past active states to the terminal state were determined. These distances were then used in the update strategy. The proposed algorithm was applied to mountain car (Fig. 4) and ball-beam (Fig. 5) models. The proposed EIR(0)-learning was empirically compared to R(0)-learning [4, 6], R(λ)-learning and model based algorithms: Dyna-R and prioritized sweeping-R [11]. In the case of ball-beam system, EIR(0)-learning algorithm reached the stable control strategy after the smallest number of trials (Tab. 1, column 2). For the mountain car system, the number of trials was smaller than in the case of R(0)-learning and R(λ)-learning algorithms, but greater than for Dyna-R and prioritized sweeping-R. It is worth to pay attention to the fact that the execution times of Dyna-R and prioritized sweeping-R algorithms in the incremental mode were respectively 5 and 50 times longer than for proposed EIR(0)-learning algorithm (Tab. 2, column 3). The main conclusion of this work is that the epoch – incremental learning algorithm provided the stable control strategy in relatively small number of trials and in short time of single iteration.
Źródło:: Pomiary Automatyka Kontrola; 2013, R. 59, nr 7, 7; 700-703
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Adaptive controller design for electric drive with variable parameters by Reinforcement Learning method
Autorzy:: Pajchrowski, T.
Siwek, P.
Wójcik, A.
Powiązania:: https://bibliotekanauki.pl/articles/201068.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: Reinforcement Learning
adaptive control
electric drive
machine learning
Opis:: The paper presents a method for designing a neural speed controller with use of Reinforcement Learning method. The controlled object is an electric drive with a synchronous motor with permanent magnets, having a complex mechanical structure and changeable parameters. Several research cases of the control system with a neural controller are presented, focusing on the change of object parameters. Also, the influence of the system critic behaviour is researched, where the critic is a function of control error and energy cost. It ensures long term performance stability without the need of switching off the adaptation algorithm. Numerous simulation tests were carried out and confirmed on a real stand.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2020, 68, 5; 1019-1030
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Multi agent deep learning with cooperative communication
Autorzy:: Simões, David
Lau, Nuno
Reis, Luís Paulo
Powiązania:: https://bibliotekanauki.pl/articles/1837537.pdf
Data publikacji:: 2020
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: multi-agent systems
deep reinforcement learning
centralized learning
Opis:: We consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partiallyobservable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2020, 10, 3; 189-207
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Discrete uncertainty quantification for offline reinforcement learning
Autorzy:: Pérez Torres, Jose Luis
Corrochano Jiménez, Javier
García, Javier
Majadas, Rubén
Ibañez-Llano, Cristina
Pérez, Sergio
Fernández, Fernando
Powiązania:: https://bibliotekanauki.pl/articles/23944835.pdf
Data publikacji:: 2023
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: off-line reinforcement learning
uncertainty quantification
machine learning
Opis:: In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent with the environment is impractical, either because such interaction is expensive or dangerous. In these cases, previous gathered data can be used, arising what is typically called Offline RL. However, this type of learning faces a large number of challenges, mostly derived from the fact that exploration/exploitation trade-off is overshadowed. In addition, the historical data is usually biased by the way it was obtained, typically, a sub-optimal controller, producing a distributional shift from historical data and the one required to learn the optimal policy. In this paper, we present a novel approach to deal with the uncertainty risen by the absence or sparse presence of some state-action pairs in the learning data. Our approach is based on shaping the reward perceived from the environment to ensure the task is solved. We present the approach and show that combining it with classic online RL methods make them perform as good as state of the art Offline RL algorithms such as CQL and BCQ. Finally, we show that using our method on top of established offline learning algorithms can improve them.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2023, 13, 4; 273--287
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Reinforcement-Based Learning for Process Classification Task
Autorzy:: Bashir, Lubna Zaghlul
Powiązania:: https://bibliotekanauki.pl/articles/1192874.pdf
Data publikacji:: 2016
Wydawca:: Przedsiębiorstwo Wydawnictw Naukowych Darwin / Scientific Publishing House DARWIN
Tematy:: Reinforcement Learning
Reward
Classification
Bucket Brigade Algorithm
Opis:: In this work, we present a reinforcement-based learning algorithm that includes the automatic classification of both sensors and actions. The classification process is prior to any application of reinforcement learning. If categories are not at the adequate abstraction level, the problem could be not learnable. The classification process is usually done by the programmer and is not considered as part of the learning process. However, in complex tasks, environments, or agents, this manual process could become extremely difficult. To solve this inconvenience, we propose to include the classification into the learning process. We apply an algorithm to automatically learn to achieve a task through reinforcement learning that works without needing a previous classification process. The system is called Fish or Ship (FOS) assigned the task of inducing classification rules for classification task described in terms of 6 attributes. The task is to categorize an object that has one or more of the following features: Sail, Solid, Big, Swim, Eye, Fins into one of the following: fish, or ship. First results of the application of this algorithm are shown Reinforcement learning techniques were used to implement classification task with interesting properties such as provides guidance to the system and shortening the number of cycles required to learn.
Źródło:: World Scientific News; 2016, 36; 12-26
2392-2192
Pojawia się w:: World Scientific News
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Anapplication of decision rules in reinforcement learning
Autorzy:: Michalski, A.
Powiązania:: https://bibliotekanauki.pl/articles/206534.pdf
Data publikacji:: 2000
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: decision rules
Q-learning
reinforcement learning
rough set theory
Opis:: In this paper an application of decision rules to function representation in reinforcement learning is described. Rules are generated incrementally by method based on rough set theory from instances recorded in state-action-Q-value memory. Simulation experiment investigating the performance of the system and results achieved are reported.
Źródło:: Control and Cybernetics; 2000, 29, 4; 989-996
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Millimeter Wave Beamforming Training : A Reinforcement Learning Approach
Autorzy:: Mohamed, Ehab Mahmoud
Powiązania:: https://bibliotekanauki.pl/articles/1844599.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: millimeter wave
beamforming training
multiarmed bandit
reinforcement learning
Opis:: Beamforming training (BT) is considered as an essential process to accomplish the communications in the millimeter wave (mmWave) band, i.e., 30 ~ 300 GHz. This process aims to find out the best transmit/receive antenna beams to compensate the impairments of the mmWave channel and successfully establish the mmWave link. Typically, the mmWave BT process is highly-time consuming affecting the overall throughput and energy consumption of the mmWave link establishment. In this paper, a machine learning (ML) approach, specifically reinforcement learning (RL), is utilized for enabling the mmWave BT process by modeling it as a multi-armed bandit (MAB) problem with the aim of maximizing the long-term throughput of the constructed mmWave link. Based on this formulation, MAB algorithms such as upper confidence bound (UCB), Thompson sampling (TS), epsilon-greedy (e-greedy), are utilized to address the problem and accomplish the mmWave BT process. Numerical simulations confirm the superior performance of the proposed MAB approach over the existing mmWave BT techniques.
Źródło:: International Journal of Electronics and Telecommunications; 2021, 67, 1; 95-102
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Learning board evaluation function for Othello by hybridizing coevolution with temporal difference learning
Autorzy:: Szubert, M.
Jaśkowski, W.
Krawiec, K.
Powiązania:: https://bibliotekanauki.pl/articles/206175.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: evolutionary computation
coevolutionary algorithms
reinforcement learning
memetic computing
game strategy learning
Opis:: Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms combining evolutionary algorithms and local search, benefit from complementarity of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL superiority compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. To further exploit CTDL potential, we extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to improve search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further study.
Źródło:: Control and Cybernetics; 2011, 40, 3; 805-831
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Analiza możliwości wykorzystania algorytmów uczenia maszynowego w środowisku Unity
Analysis of the possibilities for using machine learning algorithms in the Unity environment
Autorzy:: Litwynenko, Karina
Plechawska-Wójcik, Małgorzata
Powiązania:: https://bibliotekanauki.pl/articles/1837823.pdf
Data publikacji:: 2021
Wydawca:: Politechnika Lubelska. Instytut Informatyki
Tematy:: uczenie ze wzmocnieniem
uczenie przez naśladowanie
Unity
reinforcement learning
imitation learning
Opis:: Algorytmy uczenia ze wzmocnieniem zyskują coraz większą popularność, a ich rozwój jest możliwy dzięki istnieniu narzędzi umożliwiających ich badanie. Niniejszy artykuł dotyczy możliwości zastosowania algorytmów uczenia maszynowego na platformie Unity wykorzystującej bibliotekę Unity ML-Agents Toolkit. Celem badania było porównanie dwóch algorytmów: Proximal Policy Optimization oraz Soft Actor-Critic. Zweryfikowano również możliwość poprawy wyników uczenia poprzez łączenie tych algorytmów z metodą uczenia przez naśladowanie Generative Adversarial Imitation Learning. Wyniki badania wykazały, że algorytm PPO może sprawdzić się lepiej w nieskomplikowanych środowiskach o nienatychmiastowym charakterze nagród, zaś dodatkowe zastosowanie GAIL może wpłynąć na poprawę skuteczności uczenia.
Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.
Źródło:: Journal of Computer Sciences Institute; 2021, 20; 197-204
2544-0764
Pojawia się w:: Journal of Computer Sciences Institute
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Self-improving Q-learning based controller for a class of dynamical processes
Autorzy:: Musial, Jakub
Stebel, Krzysztof
Czeczot, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/1845515.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: process control
Q-learning algorithm
reinforcement learning
intelligent control
on-line learning
Opis:: This paper presents how Q-learning algorithm can be applied as a general-purpose self-improving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Źródło:: Archives of Control Sciences; 2021, 31, 3; 527-551
1230-2384
Pojawia się w:: Archives of Control Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Self-improving Q-learning based controller for a class of dynamical processes
Autorzy:: Musial, Jakub
Stebel, Krzysztof
Czeczot, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/1845530.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: process control
Q-learning algorithm
reinforcement learning
intelligent control
on-line learning
Opis:: This paper presents how Q-learning algorithm can be applied as a general-purpose selfimproving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Źródło:: Archives of Control Sciences; 2021, 31, 3; 527-551
1230-2384
Pojawia się w:: Archives of Control Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "reinforcement learning" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język