Temat: reinforcement learning - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: An active exploration method for data efficient reinforcement learning
Autorzy:: Zhao, Dongfang
Liu, Jiafeng
Wu, Rui
Cheng, Dansong
Tang, Xianglong
Powiązania:: https://bibliotekanauki.pl/articles/331205.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: reinforcement learning
information entropy
PILCO
data efficiency
uczenie ze wzmocnieniem
entropia informacji
wydajność danych
Opis:: Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCOalgorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 2; 351-362
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: An automated driving strategy generating method based on WGAIL–DDPG
Autorzy:: Zhang, Mingheng
Wan, Xing
Gang, Longhui
Lv, Xinfei
Wu, Zengwen
Liu, Zhaoyang
Powiązania:: https://bibliotekanauki.pl/articles/2055167.pdf
Data publikacji:: 2021
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: automated driving system
deep learning
deep reinforcement learning
imitation learning
deep deterministic policy gradient
system jezdny
uczenie głębokie
uczenie przez naśladowanie
Opis:: Reliability, efficiency and generalization are basic evaluation criteria for a vehicle automated driving system. This paper proposes an automated driving decision-making method based on the Wasserstein generative adversarial imitation learning–deep deterministic policy gradient (WGAIL–DDPG(λ)). Here the exact reward function is designed based on the requirements of a vehicle’s driving performance, i.e., safety, dynamic and ride comfort performance. The model’s training efficiency is improved through the proposed imitation learning strategy, and a gain regulator is designed to smooth the transition from imitation to reinforcement phases. Test results show that the proposed decision-making model can generate actions quickly and accurately according to the surrounding environment. Meanwhile, the imitation learning strategy based on expert experience and the gain regulator can effectively improve the training efficiency for the reinforcement learning model. Additionally, an extended test also proves its good adaptability for different driving conditions.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2021, 31, 3; 461--470
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Epoch-incremental reinforcement learning algorithms
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/330530.pdf
Data publikacji:: 2013
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: reinforcement learning
epoch incremental algorithm
grid world
uczenie ze wzmocnieniem
algorytm przyrostowy
Opis:: In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2013, 23, 3; 623-635
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Epokowo-inkrementacyjny algorytm uczenia się ze wzmocnieniem wykorzystujący kryterium średniego wzmocnienia
The epoch-incremental reinforcement learning algorithm based on the average reward
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/152882.pdf
Data publikacji:: 2013
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: uczenie się ze wzmocnieniem
R-learning
algorytm epokowo-inkrementacyjny
average reward reinforcement learning
epoch-incremental reinforcement learning
Opis:: W artykule zaproponowano nowy, epokowo – inkrementacyjny algorytm uczenia się ze wzmocnieniem. Główną ideą tego algorytmu jest przeprowadzenie w trybie epokowym dodatkowych aktualizacji strategii w oparciu o odległości aktywnych w przeszłości stanów od stanu terminalnego. Zaproponowany algorytm oraz algorytmy R(0)-learning, R(λ)-learning, Dyna-R oraz prioritized sweeping-R zastosowano do sterowania modelem samochodu górskiego oraz modelem kulki umieszczonej na balansującej belce.
The application of the average reward reinforcement learning algorithms in the control were described in this paper. Moreover, new epoch-incremental reinforcement learning algorithm (EIR(0)-learning for short) was proposed. In this algorithm, the basic R(0)-learning algorithm was implemented in the incremental mode and the environment model was created. In the epoch mode, on the basis of the model, the distances of past active states to the terminal state were determined. These distances were then used in the update strategy. The proposed algorithm was applied to mountain car (Fig. 4) and ball-beam (Fig. 5) models. The proposed EIR(0)-learning was empirically compared to R(0)-learning [4, 6], R(λ)-learning and model based algorithms: Dyna-R and prioritized sweeping-R [11]. In the case of ball-beam system, EIR(0)-learning algorithm reached the stable control strategy after the smallest number of trials (Tab. 1, column 2). For the mountain car system, the number of trials was smaller than in the case of R(0)-learning and R(λ)-learning algorithms, but greater than for Dyna-R and prioritized sweeping-R. It is worth to pay attention to the fact that the execution times of Dyna-R and prioritized sweeping-R algorithms in the incremental mode were respectively 5 and 50 times longer than for proposed EIR(0)-learning algorithm (Tab. 2, column 3). The main conclusion of this work is that the epoch – incremental learning algorithm provided the stable control strategy in relatively small number of trials and in short time of single iteration.
Źródło:: Pomiary Automatyka Kontrola; 2013, R. 59, nr 7, 7; 700-703
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Prioritized epoch - incremental Q - learning algorithm
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/375619.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: reinforcement learning
Q-learning
grid world
Opis:: The basic reinforcement learning algorithms, such as Q-learning or Sarsa, are characterized by short time-consuming single learning step, however the number of epochs necessary to achieve the optimal policy is not acceptable. There are many methods that reduce the number of' necessary epochs, like TD(lambda greather than 0), Dyna or prioritized sweeping, but their computational time is considerable. This paper proposes a combination of Q-learning algorithm performed in the incremental mode with the method of acceleration executed in the epoch mode. This acceleration is based on the distance to the terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(lambda)-learning, Dyna-Q and prioritized sweeping in the experiments of three grid worlds. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.
Efektywność podstawowych algorytmów uczenia ze wzmocnieniem Q-learning i Sarsa, mierzona liczbą prób niezbędnych do uzyskania strategii optymalnej jest stosunkowo niewielka. Stąd też możliwości praktycznego zastosowania tego algorytmu są niewielkie. Zaletą tych podstawowych algorytmów jest jednak niewielka złożoność obliczeniowa, sprawiająca, że czas wykonania pojedynczego kroku uczenia jest na tyle mały, że znakomicie sprawdzają się one w systemach sterowania online. Stosowane metody przyśpieszania procesu uczenia ze wzmocnieniem, które pozwalająna uzyskanie stanu absorbującego po znacznie mniejszej liczbie prób, niż algorytmy podstawowe powodują najczęściej zwiększenie złożoności obliczeniowej i wydłużenie czasu wykonania pojedynczego kroku uczenia. Najczęściej stosowane przyśpieszanie metodą różnic czasowych TD(lambda znak większości 0) wiąże się z zastosowaniem dodatkowych elementów pamięciowych, jakimi są ślady aktywności (eligibility traces). Czas wykonania pojedynczego kroku uczenia w takim algorytmie znacznie się wydłuża, gdyż w odróżnieniu od algorytmu podstawowego, gdzie aktualizacji podlegała wyłącznie funkcja wartości akcji tylko dla stanu aktywnego, tutaj aktualizację przeprowadza się dla wszystkich stanów. Bardziej wydajne metody przyśpieszania, takie jak Dyna, czy też prioritized sweeping również należą do klasy algorytmów pamięciowych, a ich główną ideą jest uczenie ze wzmocnieniem w oparciu o adaptacyjny model środowiska. Metody te pozwalają na uzyskanie stanu absorbującego w znacznie mniejszej liczbie prób, jednakże, na skutek zwiększonej złożoności obliczeniowej, czas wykonania pojedynczego kroku uczenia jest już istotnym czynnikiem ograniczającym zastosowanie tych metod w systemach o znacznej liczbie stanów. Istotą tych algorytmów jest dokonywanie ustalonej liczby aktualizacji funkcji wartości akcji stanów aktywnych w przeszłości, przy czym w przypadku algorytmu Dyna są to stany losowo wybrane, natomiast w przypadku prioritized sweeping stany uszeregowane wg wielkości błędu aktualizacji. W niniejszym artykule zaproponowano epokowo-inkrementacyjny algorytm uczenia ze wzmocnieniem, którego główną ideą jest połączenie podstawowego, inkrementacyjnego algorytmu uczenia ze wzmocnieniem Q-lerning z algorytmem przyśpieszania wykonywanym epokowo. Zaproponowana metoda uczenia epokowego w głównej mierze opiera się na rzeczywistej wartości sygnału wzmocnienia obserwowanego przy przejściu do stanu absorbującego, który jest następnie wykładniczo propagowany wstecz w zależności od estymowanej odległości od stanu absorbującego. Dzięki takiemu podej- ściu uzyskano niewielki czas uczenia pojedynczego kroku w trybie inkrementacyjnym (Tab. 2) przy zachowaniu efektywności typowej dla algorytmów Dyna, czy też prioritized sweeping (Tab. 1 i Fig. 5).
Źródło:: Theoretical and Applied Informatics; 2012, 24, 2; 159-171
1896-5334
Pojawia się w:: Theoretical and Applied Informatics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Uczenie ze wzmocnieniem regulatora Takagi-Sugeno metodą elementów ASE/ACE
Reinforcement learning with use of neuronlike elements ASE/ACE of Takagi-Sugeno controller
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/156302.pdf
Data publikacji:: 2005
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: regulator rozmyty
uczenie ze wzmocnieniem
wahadło odwrócone
fuzzy controller
reinforcement learning
inverted pendulum
Opis:: W artykule opisano zastosowanie algorytmu uczenia ze wzmocnieniem metodą elementów ASE/ACE do uczenia następników reguł regulatora rozmytego Takagi-Sugeno. Poprawność proponowanych rozwiązań zweryfikowano symulacyjnie w sterowaniu układem wahadło odwrócone - wózek. Przeprowadzono również eksperymenty porównawcze z klasyczną siecią elementów ASE/ACE. Pokazano zalety i wady rozwiązania klasycznego i rozmytego.
The adaptation of reinforcement learning algorithm with the use of ASE/ACE elements for rule consequence learning of the Takagi-Sugeno fuzzy logic controller is proposed. The solution is applied to control of the cart-pole system and tested by computer simulations. The original neuronlike elements ASE/ACE are simulated as well. Advantages and disadvantages of the both approaches (fuzzy and classical) are demonstrated.
Źródło:: Pomiary Automatyka Kontrola; 2005, R. 51, nr 1, 1; 47-49
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Utilization of Deep Reinforcement Learning for Discrete Resource Allocation Problem in Project Management – a Simulation Experiment
Wykorzystanie uczenia ze wzmocnieniem w problemach dyskretnej alokacji zasobów w zarządzaniu projektami – eksperyment symulacyjny
Autorzy:: Wójcik, Filip
Powiązania:: https://bibliotekanauki.pl/articles/2179629.pdf
Data publikacji:: 2022
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: reinforcement learning (RL)
operations research
management
optimisation
uczenie ze wzmocnieniem
badania operacyjne
zarządzanie
optymalizacja
Opis:: This paper tests the applicability of deep reinforcement learning (DRL) algorithms to simulated problems of constrained discrete and online resource allocation in project management. DRL is an extensively researched method in various domains, although no similar case study was found when writing this paper. The hypothesis was that a carefully tuned RL agent could outperform an optimisation-based solution. The RL agents: VPG, AC, and PPO, were compared against a classic constrained optimisation algorithm in trials: “easy”/”moderate”/”hard” (70/50/30% average project success rate). Each trial consisted of 500 independent, stochastic simulations. The significance of the differences was checked using a Welch ANOVA on significance level alpha = 0.01, followed by post hoc comparisons for false-discovery control. The experiment revealed that the PPO agent performed significantly better in moderate and hard simulations than the optimisation approach and other RL methods.
W artykule zbadano stosowalność metod głębokiego uczenia ze wzmocnieniem (DRL) do symulowanych problemów dyskretnej alokacji ograniczonych zasobów w zarządzaniu projektami. DRL jest obecnie szeroko badaną dziedziną, jednak w chwili przeprowadzania niniejszych badań nie natrafiono na zbliżone studium przypadku. Hipoteza badawcza zakładała, że prawidłowo skonstruowany agent RL będzie w stanie uzyskać lepsze wyniki niż klasyczne podejście wykorzystujące optymalizację. Dokonano porównania agentów RL: VPG, AC i PPO z algorytmem optymalizacji w trzech symulacjach: „łatwej”/„średniej”/ „trudnej” (70/50/30% średnich szans na sukces projektu). Każda symulacja obejmowała 500 niezależnych, stochastycznych eksperymentów. Istotność różnic porównano testem ANOVA Welcha na poziomie istotności α = 0.01, z następującymi po nim porównaniami post hoc z kontrolą poziomu błędu. Eksperymenty wykazały, że agent PPO uzyskał w najtrud- niejszych symulacjach znacznie lepsze wyniki niż metoda optymalizacji i inne algorytmy RL.
Źródło:: Informatyka Ekonomiczna; 2022, 1; 56-74
1507-3858
Pojawia się w:: Informatyka Ekonomiczna
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: O doborze reguł sterowania dla regulatora rozmytego
About collecting of control for a fuzzy logic controller
Autorzy:: Wiktorowicz, K.
Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/156306.pdf
Data publikacji:: 2005
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: sterowanie rozmyte
sieci neuronowe
uczenie ze wzmocnieniem
fuzzy control
neural networks
reinforcement learning
stability
quality
Opis:: W pracy scharakteryzowano problem doboru reguł sterowania dla regulatora rozmytego. Omówiono metody pozyskiwania reguł za pomocą sieci neuronowej uczonej metodą z nauczycielem i ze wzmocnieniem. Przedstawiono zagadnienie badania stabilności i jakości zaprojektowanego układu. Omawiane problemy zilustrowano przykładowymi wynikami badań.
In the paper the problem of collecting of control rules a fuzzy logic controller is characterised. Two methods of generating of rules using neural network are described: supervised learning and reinforcement learning. the problem of stability and quality analysis is presented. The considerations are illustrated by examples.
Źródło:: Pomiary Automatyka Kontrola; 2005, R. 51, nr 1, 1; 44-46
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Sumienie maszyny? Sztuczna inteligencja i problem odpowiedzialności moralnej
The Conscience of a Machine? Artificial Intelligence and the Problem of Moral Responsibility
Autorzy:: Wieczorek, Krzysztof Tomasz
Jędrzejko, Paweł
Powiązania:: https://bibliotekanauki.pl/articles/1912551.pdf
Data publikacji:: 2021-09-03
Wydawca:: Wydawnictwo Uniwersytetu Śląskiego
Tematy:: sztuczna inteligencja
etyka
reinforcement learning
autonomia decyzyjna
artificial intelligence
ethics
decision-making autonomy
Opis:: Przyspieszający postęp w dziedzinie inteligentnych technologii rodzi nowe wyzwania etyczne, z którymi w dłuższej lub krótszej perspektywie ludzkość będzie musiała się zmierzyć. Nieuniknionym elementem owego postępu jest rosnąca autonomia w zakresie podejmowania decyzji przez maszyny i systemy, nienadzorowane bezpośrednio przez człowieka. Co najmniej niektóre z tych decyzji będą rodzić konflikty i dylematy moralne. Już dziś warto się zastanowić nad tym, jakie środki są niezbędne, by przyszłe autonomiczne, samouczące i samoreplikujące się obiekty, wyposażone w sztuczną inteligencję i zdolne do samodzielnego działania w dużym zakresie zmienności warunków zewnętrznych, wyposażyć w specyficzny rodzaj inteligencji etycznej. Problem, z którym muszą się zmierzyć zarówno konstruktorzy, jak i użytkownicy tworów obdarzonych sztuczną inteligencją, polega na konieczności optymalnego wyważenia racji, potrzeb i interesów między obiema stronami ludzko-nieludzkiej interakcji. W sytuacji rosnącej autonomii maszyn przestaje bowiem wystarczać etyka antropocentryczna. Potrzebny jest nowy, poszerzony i zmodyfikowany model etyki, który pozwoli przewidzieć i objąć swoim zakresem dotychczas niewystępujący obszar równorzędnych relacji człowieka i maszyny. Niektórym aspektom tego zagadnienia poświęcony jest niniejszy artykuł.
The ever-accelerating progress in the area of smart technologies gives rise to new ethical challenges, which humankind will sooner or later have to face. An inevitable component of this progress is the increase in the autonomy of the decision-making processes carried out by machines and systems functioning without direct human control. At least some of these decisions will generate conflicts and moral dilemmas. It is therefore worth the while to reflect today upon the measures that need to be taken in order to endow the autonomous, self-learning and self-replicating entities – products equipped with artificial intelligence and capable of independent operation in a wide variety of external conditions and circumstances – with a unique kind of ethical intelligence. At the core of the problem, which both the designers and the users of entities bestowed with artificial intelligence must eventually face, lies the question of how to attain the optimal balance between the goals, needs and interests of both sides of the human-non-human interaction. It is so, because in the context of the expansion of the autonomy of the machines, the anthropocentric model of ethics does no longer suffice. It is therefore necessary to develop a new, extended and modified, model of ethics: a model which would encompass the whole, thus far non-existent, area of equal relations between the human and the machine, and which would allow one to predict its dynamics. The present article addresses some of the aspects of this claim.
Źródło:: ER(R)GO: Teoria – Literatura – Kultura; 2021, 42; 15-34
1508-6305
2544-3186
Pojawia się w:: ER(R)GO: Teoria – Literatura – Kultura
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Learning board evaluation function for Othello by hybridizing coevolution with temporal difference learning
Autorzy:: Szubert, M.
Jaśkowski, W.
Krawiec, K.
Powiązania:: https://bibliotekanauki.pl/articles/206175.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: evolutionary computation
coevolutionary algorithms
reinforcement learning
memetic computing
game strategy learning
Opis:: Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms combining evolutionary algorithms and local search, benefit from complementarity of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL superiority compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. To further exploit CTDL potential, we extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to improve search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further study.
Źródło:: Control and Cybernetics; 2011, 40, 3; 805-831
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Handling realistic noise in multi-agent systems with self-supervised learning and curiosity
Autorzy:: Szemenyei, Marton
Reizinger, Patrik
Powiązania:: https://bibliotekanauki.pl/articles/2147129.pdf
Data publikacji:: 2022
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: deep reinforcement learning
multi-agent environment
autonomous driving
robot soccer
self-supervised learning
Opis:: Most reinforcement learning benchmarks – especially in multi-agent tasks – do not go beyond observations with simple noise; nonetheless, real scenarios induce more elaborate vision pipeline failures: false sightings, misclassifications or occlusion. In this work, we propose a lightweight, 2D environment for robot soccer and autonomous driving that can emulate the above discrepancies. Besides establishing a benchmark for accessible multiagent reinforcement learning research, our work addresses the challenges the simulator imposes. For handling realistic noise, we use self-supervised learning to enhance scene reconstruction and extend curiosity-driven learning to model longer horizons. Our extensive experiments show that the proposed methods achieve state-of-the-art performance, compared against actor-critic methods, ICM, and PPO.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2022, 12, 2; 135--148
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: A strategy learning model for autonomous agents based on classification
Autorzy:: Śnieżyński, B.
Powiązania:: https://bibliotekanauki.pl/articles/330672.pdf
Data publikacji:: 2015
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: autonomous agents
strategy learning
supervised learning
classification
reinforcement learning
czynnik niezależny
uczenie nadzorowane
uczenie ze wzmocnieniem
Opis:: In this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer–pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2015, 25, 3; 471-482
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Simplification of deep reinforcement learning in traffic control using the Bonsai Platform
Uproszczenie uczenia się przez głębokie wzmocnienie w zarządzaniu ruchem z wykorzystaniem Platformy Bonsai
Autorzy:: Skuba, Michal
Janota, Aleš
Powiązania:: https://bibliotekanauki.pl/articles/2013058.pdf
Data publikacji:: 2020-12-31
Wydawca:: Uniwersytet Technologiczno-Humanistyczny im. Kazimierza Pułaskiego w Radomiu
Tematy:: control
deep reinforcement learning
model
simulation
traffic
sterowanie
uczenie w głębokim uczeniu przez wzmacnianie
symulacja
ruch drogowy
Opis:: The paper deals with the problem of traffic light control of road intersection. The authors use a model of a real road junction created in the AnyLogic modelling tool. For two scenarios, there are three simulation experiments performed – fixed time control, fixed time control after AnyLogic-based optimizations, and dynamic control obtained through the cooperation of the AnyLogic tool and the Bonsai platform, utilizing benefits of deep reinforcement learning. At present, there are trends to simplify machine learning processes as much as possible to make them accessible to practitioners with no artificial intelligence background and without the need to become data scientists. Project Bonsai represents an easy-to-use connector, that allows to use AnyLogic models connected to the Bonsai platform - a novel approach to machine learning without the need to set any hyper-parameters. Due to unavailability of real operational data, the model uses simulation data only, with presence and movement of vehicles only (no pedestrians). The optimization problem consists in minimizing the average time that agents (vehicles) must spend in the model, passing the modelled intersection. Another observed parameter is the maximum time of individual vehicles spent in the model. The authors share their practical, mainly methodological, experiences with the simulation process and indicate economic cost needed for training as well.
Artykuł dotyczy problemu sterowania sygnalizacją świetlną na skrzyżowaniach dróg. Autorzy wykorzystują model rzeczywistego węzła drogowego utworzony w narzędziu do modelowania AnyLogic. Dla dwóch scenariuszy wykonywane są trzy eksperymenty symulacyjne - sterowanie światłami sygnalizacyjnymi o stałym czasie działania, sterowanie światłami sygnalizacyjnymi o stałym czasie działania po optymalizacji w oparciu o AnyLogic, i sterowanie dynamiczne dzięki współpracy między AnyLogic i platformą Bonsai, wykorzystując korzyści płynące z uczenia się przez głębokie wzmocnienie. Obecnie istnieją tendencje do maksymalnego upraszczania procesów uczenia maszynowego, aby były dostępne dla praktyków bez doświadczenia w zakresie sztucznej inteligencji i bez konieczności zostania naukowcami danych. Project Bonsai to łatwe w obsłudze złącze, które pozwala na korzystanie z modeli AnyLogic podłączonych do platformy Bonsai - nowatorskie podejście do uczenia maszynowego bez konieczności ustawiania hiperparametrów. Ze względu na niedostępność rzeczywistych danych eksploatacyjnych model wykorzystuje tylko dane symulacyjne, tylko z obecnością i ruchem pojazdów (bez pieszych). Problem optymalizacji polega na zminimalizowaniu średniego czasu, jaki agenci (pojazdy) muszą spędzać w modelu, mijając modelowane skrzyżowanie. Kolejnym obserwowanym parametrem jest maksymalny czas przebywania poszczególnych pojazdów w modelu. Autorzy dzielą się praktycznymi, głównie metodologicznymi, doświadczeniami związanymi z procesem symulacji oraz wskazują koszty ekonomiczne potrzebne do uczenia.
Źródło:: Journal of Civil Engineering and Transport; 2020, 2, 4; 191-202
2658-1698
2658-2120
Pojawia się w:: Journal of Civil Engineering and Transport
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Multi agent deep learning with cooperative communication
Autorzy:: Simões, David
Lau, Nuno
Reis, Luís Paulo
Powiązania:: https://bibliotekanauki.pl/articles/1837537.pdf
Data publikacji:: 2020
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: multi-agent systems
deep reinforcement learning
centralized learning
Opis:: We consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partiallyobservable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2020, 10, 3; 189-207
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation
Autorzy:: Rak, A.
Gierusz, W.
Powiązania:: https://bibliotekanauki.pl/articles/259073.pdf
Data publikacji:: 2012
Wydawca:: Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:: ship motion control
trajectory generation
autonomous navigation
reinforcement learning
least-squares policy iteration
Opis:: This paper presents the application of the reinforcement learning algorithms to the task of autonomous determination of the ship trajectory during thein-harbour and harbour approaching manoeuvres. Authors used Markov decision processes formalism to build up the background of algorithm presentation. Two versions of RL algorithms were tested in the simulations: discrete (Q-learning) and continuous form (Least-Squares Policy Iteration). The results show that in both cases ship trajectory can be found. However discrete Q-learning algorithm suffered from many limitations (mainly curse of dimensionality) and practically is not applicable to the examined task. On the other hand, LSPI gave promising results. To be fully operational, proposed solution should be extended by taking into account ship heading and velocity and coupling with advanced multi-variable controller.
Źródło:: Polish Maritime Research; 2012, S 1; 31-36
1233-2585
Pojawia się w:: Polish Maritime Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "reinforcement learning" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język