Wszystkie pola: reinforcement learning - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Reinforcement Learning in Ship Handling
Autorzy:: Łącki, M.
Powiązania:: https://bibliotekanauki.pl/articles/117361.pdf
Data publikacji:: 2008
Wydawca:: Uniwersytet Morski w Gdyni. Wydział Nawigacyjny
Tematy:: Ship Handling
Reinforcement Learning
Machine Learning Techniques
Manoeuvring
Restricted Waters
Markov Decision Process (MDP)
Artificial Neural Network (ANN)
multi-agent environment
Opis:: This paper presents the idea of using machine learning techniques to simulate and demonstrate learning behaviour in ship manoeuvring. Simulated model of ship is treated as an agent, which through environmental sensing learns itself to navigate through restricted waters selecting an optimum trajectory. Learning phase of the task is to observe current state and choose one of the available actions. The agent gets positive reward for reaching destination and negative reward for hitting an obstacle. Few reinforcement learning algorithms are considered. Experimental results based on simulation program are presented for different layouts of possible routes within restricted area.
Źródło:: TransNav : International Journal on Marine Navigation and Safety of Sea Transportation; 2008, 2, 2; 157-160
2083-6473
2083-6481
Pojawia się w:: TransNav : International Journal on Marine Navigation and Safety of Sea Transportation
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Epoch-incremental reinforcement learning algorithms
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/330530.pdf
Data publikacji:: 2013
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: reinforcement learning
epoch incremental algorithm
grid world
uczenie ze wzmocnieniem
algorytm przyrostowy
Opis:: In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2013, 23, 3; 623-635
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Anapplication of decision rules in reinforcement learning
Autorzy:: Michalski, A.
Powiązania:: https://bibliotekanauki.pl/articles/206534.pdf
Data publikacji:: 2000
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: decision rules
Q-learning
reinforcement learning
rough set theory
Opis:: In this paper an application of decision rules to function representation in reinforcement learning is described. Rules are generated incrementally by method based on rough set theory from instances recorded in state-action-Q-value memory. Simulation experiment investigating the performance of the system and results achieved are reported.
Źródło:: Control and Cybernetics; 2000, 29, 4; 989-996
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Discrete uncertainty quantification for offline reinforcement learning
Autorzy:: Pérez Torres, Jose Luis
Corrochano Jiménez, Javier
García, Javier
Majadas, Rubén
Ibañez-Llano, Cristina
Pérez, Sergio
Fernández, Fernando
Powiązania:: https://bibliotekanauki.pl/articles/23944835.pdf
Data publikacji:: 2023
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: off-line reinforcement learning
uncertainty quantification
machine learning
Opis:: In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent with the environment is impractical, either because such interaction is expensive or dangerous. In these cases, previous gathered data can be used, arising what is typically called Offline RL. However, this type of learning faces a large number of challenges, mostly derived from the fact that exploration/exploitation trade-off is overshadowed. In addition, the historical data is usually biased by the way it was obtained, typically, a sub-optimal controller, producing a distributional shift from historical data and the one required to learn the optimal policy. In this paper, we present a novel approach to deal with the uncertainty risen by the absence or sparse presence of some state-action pairs in the learning data. Our approach is based on shaping the reward perceived from the environment to ensure the task is solved. We present the approach and show that combining it with classic online RL methods make them perform as good as state of the art Offline RL algorithms such as CQL and BCQ. Finally, we show that using our method on top of established offline learning algorithms can improve them.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2023, 13, 4; 273--287
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Millimeter Wave Beamforming Training : A Reinforcement Learning Approach
Autorzy:: Mohamed, Ehab Mahmoud
Powiązania:: https://bibliotekanauki.pl/articles/1844599.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: millimeter wave
beamforming training
multiarmed bandit
reinforcement learning
Opis:: Beamforming training (BT) is considered as an essential process to accomplish the communications in the millimeter wave (mmWave) band, i.e., 30 ~ 300 GHz. This process aims to find out the best transmit/receive antenna beams to compensate the impairments of the mmWave channel and successfully establish the mmWave link. Typically, the mmWave BT process is highly-time consuming affecting the overall throughput and energy consumption of the mmWave link establishment. In this paper, a machine learning (ML) approach, specifically reinforcement learning (RL), is utilized for enabling the mmWave BT process by modeling it as a multi-armed bandit (MAB) problem with the aim of maximizing the long-term throughput of the constructed mmWave link. Based on this formulation, MAB algorithms such as upper confidence bound (UCB), Thompson sampling (TS), epsilon-greedy (e-greedy), are utilized to address the problem and accomplish the mmWave BT process. Numerical simulations confirm the superior performance of the proposed MAB approach over the existing mmWave BT techniques.
Źródło:: International Journal of Electronics and Telecommunications; 2021, 67, 1; 95-102
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: An active exploration method for data efficient reinforcement learning
Autorzy:: Zhao, Dongfang
Liu, Jiafeng
Wu, Rui
Cheng, Dansong
Tang, Xianglong
Powiązania:: https://bibliotekanauki.pl/articles/331205.pdf
Data publikacji:: 2019
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: reinforcement learning
information entropy
PILCO
data efficiency
uczenie ze wzmocnieniem
entropia informacji
wydajność danych
Opis:: Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCOalgorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2019, 29, 2; 351-362
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Deep reinforcement learning overview of the state of the art
Autorzy:: Fenjiro, Y.
Benbrahim, H.
Powiązania:: https://bibliotekanauki.pl/articles/384788.pdf
Data publikacji:: 2018
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: reinforcement learning
deep learning
convolutional network
recurrent network
deep reinforcement learning
Opis:: Artificial intelligence has made big steps forward with reinforcement learning (RL) in the last century, and with the advent of deep learning (DL) in the 90s, especially, the breakthrough of convolutional networks in computer vision field. The adoption of DL neural networks in RL, in the first decade of the 21 century, led to an end-toend framework allowing a great advance in human-level agents and autonomous systems, called deep reinforcement learning (DRL). In this paper, we will go through the development Timeline of RL and DL technologies, describing the main improvements made in both fields. Then, we will dive into DRL and have an overview of the state-ofthe- art of this new and promising field, by browsing a set of algorithms (Value optimization, Policy optimization and Actor-Critic), then, giving an outline of current challenges and real-world applications, along with the hardware and frameworks used. In the end, we will discuss some potential research directions in the field of deep RL, for which we have great expectations that will lead to a real human level of intelligence.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2018, 12, 3; 20-39
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Selective maintenance optimization with stochastic break duration based on reinforcement learning
Autorzy:: Liu, Yilai
Qian, Xinbo
Powiązania:: https://bibliotekanauki.pl/articles/2200939.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Polskie Naukowo-Techniczne Towarzystwo Eksploatacyjne PAN
Tematy:: selective maintenance
stochastic break duration
imperfect maintenance
reinforcement
learning
Opis:: For industrial and military applications, a sequence of missions would be performed with a limited break between two adjacent missions. To improve the system reliability, selective maintenance may be performed on components during the break. Most studies on selective maintenance generally use minimal repair and replacement as maintenance actions while break duration is assumed to be deterministic. However, in practical engineering, many maintenance actions are imperfect maintenance, and the break duration is stochastic due to environmental and other factors. Therefore, a selective maintenance optimization model is proposed with imperfect maintenance for stochastic break duration. The model is aimed to maximize the reliability of system successfully completing the next mission. The reinforcement learning(RL) method is applied to optimally select maintenance actions for selected components. The proposed model and the advantages of the RL are verified by three case studies verify.
Źródło:: Eksploatacja i Niezawodność; 2022, 24, 4; 771--784
1507-2711
Pojawia się w:: Eksploatacja i Niezawodność
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Online learning algorithm for zero-sum games with integral reinforcement learning
Autorzy:: Vamvoudakis, K. G.
Vrabie, D.
Lewis, F. L.
Powiązania:: https://bibliotekanauki.pl/articles/91780.pdf
Data publikacji:: 2011
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: learning
online algorithm
zero-sum game
game
infinite horizon
Hamilton-Jacobi-Isaacs equation
approximation network
optimal value function
adaptive control tuning algorithm
Nash solution
Opis:: In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time zero sum game solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Isaacs equation and it does not require explicit knowledge on the system’s drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/ disturbance/critic structure having three adaptive approximator structures. All three approximation networks are adapted simultaneously. A persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel adaptive control tuning algorithms are given for critic, disturbance and actor networks. The convergence to the Nash solution of the game is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2011, 1, 4; 315-332
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Reinforcement Learning in Discrete and Continuous Domains Applied to Ship Trajectory Generation
Autorzy:: Rak, A.
Gierusz, W.
Powiązania:: https://bibliotekanauki.pl/articles/259073.pdf
Data publikacji:: 2012
Wydawca:: Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:: ship motion control
trajectory generation
autonomous navigation
reinforcement learning
least-squares policy iteration
Opis:: This paper presents the application of the reinforcement learning algorithms to the task of autonomous determination of the ship trajectory during thein-harbour and harbour approaching manoeuvres. Authors used Markov decision processes formalism to build up the background of algorithm presentation. Two versions of RL algorithms were tested in the simulations: discrete (Q-learning) and continuous form (Least-Squares Policy Iteration). The results show that in both cases ship trajectory can be found. However discrete Q-learning algorithm suffered from many limitations (mainly curse of dimensionality) and practically is not applicable to the examined task. On the other hand, LSPI gave promising results. To be fully operational, proposed solution should be extended by taking into account ship heading and velocity and coupling with advanced multi-variable controller.
Źródło:: Polish Maritime Research; 2012, S 1; 31-36
1233-2585
Pojawia się w:: Polish Maritime Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: Adaptive controller design for electric drive with variable parameters by Reinforcement Learning method
Autorzy:: Pajchrowski, T.
Siwek, P.
Wójcik, A.
Powiązania:: https://bibliotekanauki.pl/articles/201068.pdf
Data publikacji:: 2020
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: Reinforcement Learning
adaptive control
electric drive
machine learning
Opis:: The paper presents a method for designing a neural speed controller with use of Reinforcement Learning method. The controlled object is an electric drive with a synchronous motor with permanent magnets, having a complex mechanical structure and changeable parameters. Several research cases of the control system with a neural controller are presented, focusing on the change of object parameters. Also, the influence of the system critic behaviour is researched, where the critic is a function of control error and energy cost. It ensures long term performance stability without the need of switching off the adaptation algorithm. Numerous simulation tests were carried out and confirmed on a real stand.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2020, 68, 5; 1019-1030
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Optimal control of dynamic systems using a new adjoining cell mapping method with reinforcement learning
Autorzy:: Arribas-Navarro, T.
Prieto, S.
Plaza, M.
Powiązania:: https://bibliotekanauki.pl/articles/205725.pdf
Data publikacji:: 2015
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: optimal control
cells mapping
state space
reinforcement learning
stability
nonlinear control
controllability
Opis:: This work aims to improve and simplify the procedure used in the Control Adjoining Cell Mapping with Reinforcement Learning (CACM-RL) technique, for the tuning process of an optimal contro ller during the pre-learning stage (controller design), making easier the transition from a simulation environment to the real world. Common problems, encountered when working with CACM-RL, are the adjustment of the cell size and the long-term evolution error. In this sense, the main goal of the new approach, developed for CACM-RL that is proposed in this work (CACMRL*), is to give a response to both problems for helping engineers in deﬁning of the control solution with accuracy and stability criteria instead of cell sizes. The new approach improves the mathematical analysis techniques and reduces the engineering eﬀort during the design phase. In order to demonstrate the behaviour of CACM-RL*, three examples are described to show its application to real problems. In All the examples, CACM-RL* improves with respect to the considered alternatives. In some cases, CACM- RL* improves the average controllability by up to 100%.
Źródło:: Control and Cybernetics; 2015, 44, 3; 369-387
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Three-dimensional path-following control of an autonomous underwater vehicle based on deep reinforcement learning
Autorzy:: Liang, Zhenyu
Qu, Xingru
Zhang, Zhao
Chen, Cong
Powiązania:: https://bibliotekanauki.pl/articles/32898215.pdf
Data publikacji:: 2022
Wydawca:: Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:: autonomous underwater vehicle (AUV)
three-dimensional path following
deep reinforcement learning-based control
lineof-sight guidance
controller chattering
Opis:: In this article, a deep reinforcement learning based three-dimensional path following control approach is proposed for an underactuated autonomous underwater vehicle (AUV). To be specific, kinematic control laws are employed by using the three-dimensional line-of-sight guidance and dynamic control laws are employed by using the twin delayed deep deterministic policy gradient algorithm (TD3), contributing to the surge velocity, pitch angle and heading angle control of an underactuated AUV. In order to solve the chattering of controllers, the action filter and the punishment function are built respectively, which can make control signals stable. Simulations are carried out to evaluate the performance of the proposed control approach. And results show that the AUV can complete the control mission successfully.
Źródło:: Polish Maritime Research; 2022, 4; 36-44
1233-2585
Pojawia się w:: Polish Maritime Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: Combination of Advanced Reservation and Resource Periodic Arrangement for RMSA in EON with Deep Reinforcement Learning
Autorzy:: Silaban, R. J.
Alaydrus, Mudrik
Umaisaroh, U.
Powiązania:: https://bibliotekanauki.pl/articles/27311944.pdf
Data publikacji:: 2023
Wydawca:: Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:: Elastic Optical Networks (EON)
Routing Modulation and Spectrum Assignment (RMSA)
Advanced Reservation (AR)
Resource Periodic Arrangement (RPA)
Opis:: The Elastic Optical Networks (EON) provide a solution to the massive demand for connections and extremely high data traffic with the Routing Modulation and Spectrum Assignment (RMSA) as a challenge. In previous RMSA research, there was a high blocking probability because the route to be passed by the K-SP method with a deep neural network approach used the First Fit policy, and the modulation problem was solved with Modulation Format Identification (MFI) or BPSK using Deep Reinforcement Learning. The issue might be apparent in spectrum assignment because of the influence of Advanced Reservation (AR) and Resource Periodic Arrangement (RPA), which is a decision block on a connection request path with both idle and active data traffic. The study’s limitation begins with determining the modulation of m = 1 and m = 4, followed by the placement of frequencies, namely 13 with a combination of standard block frequencies 41224–24412, so that the simulation results are less than 0.0199, due to the combination of block frequency slices with spectrum allocation rule techniques.
Źródło:: International Journal of Electronics and Telecommunications; 2023, 69, 3; 515--522
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: Some improvements in the reinforcement learning of a mobile robot
Uczenie ze wzmocnieniem robotów mobilnych - propozycje usprawnień
Autorzy:: Pluciński, M.
Powiązania:: https://bibliotekanauki.pl/articles/153411.pdf
Data publikacji:: 2010
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: uczenie ze wzmocnieniem
sieci neuronowe RBF
roboty mobilne
reinforcement learning
probabilistic RBF neural network
mobile robot
Opis:: The paper presents application of the reinforcement learning to autonomous mobile robot moving learning in an unknown, stationary environment. The robot movement policy was represented by a probabilistic RBF neural network. As the learning process was very slow or even impossible for complicated environments, there are presented some improvements, which were found out to be very effective in most cases.
W artykule zaprezentowane jest zastosowanie uczenia ze wzmocnieniem w poszukiwaniu strategii ruchu autonomicznego robota mobilnego w nieznanym, stacjonarnym środowisku. Zadaniem robota jest dotarcie do zadanego i znanego mu punktu docelowego jak najkrótszą drogą i bez kolizji z przeszkodami. Stan robota określa jego położenie w stałym (związanym ze środowiskiem) układzie współrzędnych, natomiast akcja wyznaczana jest jako zadany kierunek ruchu. Strategia robota zdefiniowana jest pośrednio za pomocą funkcji wartości, którą reprezentuje sztuczna sieć neuronowa typu RBF. Sieci tego typu są łatwe w uczeniu, a dodatkowo ich parametry umożliwiają wygodną interpretację realizowanego odwzorowania. Ponieważ w ogólnym przypadku uczenie robota jest bardzo trudne, a w skomplikowanych środowiskach praktycznie niemożliwe, stąd w artykule zaprezentowanych jest kilka propozycji jego usprawnienia. Opisane są eksperymenty: z wykorzystaniem ujemnych wzmocnień generowanych przez przeszkody, z zastosowaniem heurystycznych sposobów podpowiadania robotowi właściwych zachowań w "trudnych" sytuacjach oraz z wykorzystaniem uczenia stopniowego. Badania wykazały, że najlepsze efekty uczenia dało połączenie dwóch ostatnich technik.
Źródło:: Pomiary Automatyka Kontrola; 2010, R. 56, nr 12, 12; 1470-1473
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 16.

Tytuł:: Reinforcement-Based Learning for Process Classification Task
Autorzy:: Bashir, Lubna Zaghlul
Powiązania:: https://bibliotekanauki.pl/articles/1192874.pdf
Data publikacji:: 2016
Wydawca:: Przedsiębiorstwo Wydawnictw Naukowych Darwin / Scientific Publishing House DARWIN
Tematy:: Reinforcement Learning
Reward
Classification
Bucket Brigade Algorithm
Opis:: In this work, we present a reinforcement-based learning algorithm that includes the automatic classification of both sensors and actions. The classification process is prior to any application of reinforcement learning. If categories are not at the adequate abstraction level, the problem could be not learnable. The classification process is usually done by the programmer and is not considered as part of the learning process. However, in complex tasks, environments, or agents, this manual process could become extremely difficult. To solve this inconvenience, we propose to include the classification into the learning process. We apply an algorithm to automatically learn to achieve a task through reinforcement learning that works without needing a previous classification process. The system is called Fish or Ship (FOS) assigned the task of inducing classification rules for classification task described in terms of 6 attributes. The task is to categorize an object that has one or more of the following features: Sail, Solid, Big, Swim, Eye, Fins into one of the following: fish, or ship. First results of the application of this algorithm are shown Reinforcement learning techniques were used to implement classification task with interesting properties such as provides guidance to the system and shortening the number of cycles required to learn.
Źródło:: World Scientific News; 2016, 36; 12-26
2392-2192
Pojawia się w:: World Scientific News
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 17.

Tytuł:: Modeling of passengers’ choice using intelligent agents with reinforcement learning in shared interests systems; A basic approach
Autorzy:: Vikharev, Sergey
Lyapustin, Maxim
Mironov, Danil
Nizovtseva, Irina
Sinitsyn, Vladimir
Powiązania:: https://bibliotekanauki.pl/articles/961509.pdf
Data publikacji:: 2019
Wydawca:: Politechnika Śląska. Wydawnictwo Politechniki Śląskiej
Tematy:: intelligent agents
transport choosing model
passenger satisfaction index
transport quality
inteligentni agenci
wybór modelu transportu
wskaźnik satysfakcji pasażera
jakość transportu
Opis:: The purpose of this paper is to build a model for assessing the satisfaction of passenger service by the public transport system. The system is constructed using intelligent agents, whose action is based on self-learning principles. The agents are passengers who depend on transport and can choose between two modes: a car or a bus wherein their choice of transport mode for the next day is based on their level of satisfaction and their neighbors’ satisfaction with the mode they used the day before. The paper considers several algorithms of agent behavior, one of which is based on reinforcement learning. Overall, the algorithms take into account the history of the agents’ previous trips and the quality of transport services. The outcomes could be applied in assessing the quality of the transport system from the point of view of passengers.
Źródło:: Transport Problems; 2019, 14, 2; 43-53
1896-0596
2300-861X
Pojawia się w:: Transport Problems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 18.

Tytuł:: A compact DQN model for mobile agents with collision avoidance
Autorzy:: Kamola, Mariusz
Powiązania:: https://bibliotekanauki.pl/articles/27314243.pdf
Data publikacji:: 2023
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: Q‐learning
DQN
reinforcement learning
Opis:: This paper presents a complete simulation and reinforce‐ ment learning solution to train mobile agents’ strategy of route tracking and avoiding mutual collisions. The aim was to achieve such functionality with limited resources, w.r.t. model input and model size itself. The designed models prove to keep agents safely on the track. Colli‐ sion avoidance agent’s skills developed in the course of model training are primitive but rational. Small size of the model allows fast training with limited computational resources.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2023, 17, 2; 28--35
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 19.

Tytuł:: Prioritized epoch - incremental Q - learning algorithm
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/375619.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: reinforcement learning
Q-learning
grid world
Opis:: The basic reinforcement learning algorithms, such as Q-learning or Sarsa, are characterized by short time-consuming single learning step, however the number of epochs necessary to achieve the optimal policy is not acceptable. There are many methods that reduce the number of' necessary epochs, like TD(lambda greather than 0), Dyna or prioritized sweeping, but their computational time is considerable. This paper proposes a combination of Q-learning algorithm performed in the incremental mode with the method of acceleration executed in the epoch mode. This acceleration is based on the distance to the terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(lambda)-learning, Dyna-Q and prioritized sweeping in the experiments of three grid worlds. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.
Efektywność podstawowych algorytmów uczenia ze wzmocnieniem Q-learning i Sarsa, mierzona liczbą prób niezbędnych do uzyskania strategii optymalnej jest stosunkowo niewielka. Stąd też możliwości praktycznego zastosowania tego algorytmu są niewielkie. Zaletą tych podstawowych algorytmów jest jednak niewielka złożoność obliczeniowa, sprawiająca, że czas wykonania pojedynczego kroku uczenia jest na tyle mały, że znakomicie sprawdzają się one w systemach sterowania online. Stosowane metody przyśpieszania procesu uczenia ze wzmocnieniem, które pozwalająna uzyskanie stanu absorbującego po znacznie mniejszej liczbie prób, niż algorytmy podstawowe powodują najczęściej zwiększenie złożoności obliczeniowej i wydłużenie czasu wykonania pojedynczego kroku uczenia. Najczęściej stosowane przyśpieszanie metodą różnic czasowych TD(lambda znak większości 0) wiąże się z zastosowaniem dodatkowych elementów pamięciowych, jakimi są ślady aktywności (eligibility traces). Czas wykonania pojedynczego kroku uczenia w takim algorytmie znacznie się wydłuża, gdyż w odróżnieniu od algorytmu podstawowego, gdzie aktualizacji podlegała wyłącznie funkcja wartości akcji tylko dla stanu aktywnego, tutaj aktualizację przeprowadza się dla wszystkich stanów. Bardziej wydajne metody przyśpieszania, takie jak Dyna, czy też prioritized sweeping również należą do klasy algorytmów pamięciowych, a ich główną ideą jest uczenie ze wzmocnieniem w oparciu o adaptacyjny model środowiska. Metody te pozwalają na uzyskanie stanu absorbującego w znacznie mniejszej liczbie prób, jednakże, na skutek zwiększonej złożoności obliczeniowej, czas wykonania pojedynczego kroku uczenia jest już istotnym czynnikiem ograniczającym zastosowanie tych metod w systemach o znacznej liczbie stanów. Istotą tych algorytmów jest dokonywanie ustalonej liczby aktualizacji funkcji wartości akcji stanów aktywnych w przeszłości, przy czym w przypadku algorytmu Dyna są to stany losowo wybrane, natomiast w przypadku prioritized sweeping stany uszeregowane wg wielkości błędu aktualizacji. W niniejszym artykule zaproponowano epokowo-inkrementacyjny algorytm uczenia ze wzmocnieniem, którego główną ideą jest połączenie podstawowego, inkrementacyjnego algorytmu uczenia ze wzmocnieniem Q-lerning z algorytmem przyśpieszania wykonywanym epokowo. Zaproponowana metoda uczenia epokowego w głównej mierze opiera się na rzeczywistej wartości sygnału wzmocnienia obserwowanego przy przejściu do stanu absorbującego, który jest następnie wykładniczo propagowany wstecz w zależności od estymowanej odległości od stanu absorbującego. Dzięki takiemu podej- ściu uzyskano niewielki czas uczenia pojedynczego kroku w trybie inkrementacyjnym (Tab. 2) przy zachowaniu efektywności typowej dla algorytmów Dyna, czy też prioritized sweeping (Tab. 1 i Fig. 5).
Źródło:: Theoretical and Applied Informatics; 2012, 24, 2; 159-171
1896-5334
Pojawia się w:: Theoretical and Applied Informatics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 20.

Tytuł:: Epokowo-inkrementacyjny algorytm uczenia się ze wzmocnieniem wykorzystujący kryterium średniego wzmocnienia
The epoch-incremental reinforcement learning algorithm based on the average reward
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/152882.pdf
Data publikacji:: 2013
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: uczenie się ze wzmocnieniem
R-learning
algorytm epokowo-inkrementacyjny
average reward reinforcement learning
epoch-incremental reinforcement learning
Opis:: W artykule zaproponowano nowy, epokowo – inkrementacyjny algorytm uczenia się ze wzmocnieniem. Główną ideą tego algorytmu jest przeprowadzenie w trybie epokowym dodatkowych aktualizacji strategii w oparciu o odległości aktywnych w przeszłości stanów od stanu terminalnego. Zaproponowany algorytm oraz algorytmy R(0)-learning, R(λ)-learning, Dyna-R oraz prioritized sweeping-R zastosowano do sterowania modelem samochodu górskiego oraz modelem kulki umieszczonej na balansującej belce.
The application of the average reward reinforcement learning algorithms in the control were described in this paper. Moreover, new epoch-incremental reinforcement learning algorithm (EIR(0)-learning for short) was proposed. In this algorithm, the basic R(0)-learning algorithm was implemented in the incremental mode and the environment model was created. In the epoch mode, on the basis of the model, the distances of past active states to the terminal state were determined. These distances were then used in the update strategy. The proposed algorithm was applied to mountain car (Fig. 4) and ball-beam (Fig. 5) models. The proposed EIR(0)-learning was empirically compared to R(0)-learning [4, 6], R(λ)-learning and model based algorithms: Dyna-R and prioritized sweeping-R [11]. In the case of ball-beam system, EIR(0)-learning algorithm reached the stable control strategy after the smallest number of trials (Tab. 1, column 2). For the mountain car system, the number of trials was smaller than in the case of R(0)-learning and R(λ)-learning algorithms, but greater than for Dyna-R and prioritized sweeping-R. It is worth to pay attention to the fact that the execution times of Dyna-R and prioritized sweeping-R algorithms in the incremental mode were respectively 5 and 50 times longer than for proposed EIR(0)-learning algorithm (Tab. 2, column 3). The main conclusion of this work is that the epoch – incremental learning algorithm provided the stable control strategy in relatively small number of trials and in short time of single iteration.
Źródło:: Pomiary Automatyka Kontrola; 2013, R. 59, nr 7, 7; 700-703
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 21.

Tytuł:: Use of Modified Adaptive Heuristic Critic Algorithm for Novel Scheduling Mechanism in Packet-Switched Networks
Autorzy:: Jednoralski, M.
Kacprzak, T.
Powiązania:: https://bibliotekanauki.pl/articles/92909.pdf
Data publikacji:: 2005
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: reinforcement learning
telecommunication networks
packet scheduling
Opis:: In this paper a novel scheduling algorithm of packet selection in a switch node for transmission in a network channel, based on Reinforcement Learning and modified Adaptive Heuristic Critic is introduced. A comparison of two well known scheduling algorithms: Earliest Deadline First and Round Robin shows that these algorithms perform well in some cases, but they cannot adapt their behavior to traffic changes. Simulation studies show that novel scheduling algorithm outperforms Round Robin and Earliest Deadline First by adapting to changing of network conditions.
Źródło:: Studia Informatica : systems and information technology; 2005, 2(6); 21-34
1731-2264
Pojawia się w:: Studia Informatica : systems and information technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 22.

Tytuł:: Multi agent deep learning with cooperative communication
Autorzy:: Simões, David
Lau, Nuno
Reis, Luís Paulo
Powiązania:: https://bibliotekanauki.pl/articles/1837537.pdf
Data publikacji:: 2020
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: multi-agent systems
deep reinforcement learning
centralized learning
Opis:: We consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partiallyobservable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2020, 10, 3; 189-207
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 23.

Tytuł:: Uczenie ze wzmocnieniem regulatora Takagi-Sugeno metodą elementów ASE/ACE
Reinforcement learning with use of neuronlike elements ASE/ACE of Takagi-Sugeno controller
Autorzy:: Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/156302.pdf
Data publikacji:: 2005
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: regulator rozmyty
uczenie ze wzmocnieniem
wahadło odwrócone
fuzzy controller
reinforcement learning
inverted pendulum
Opis:: W artykule opisano zastosowanie algorytmu uczenia ze wzmocnieniem metodą elementów ASE/ACE do uczenia następników reguł regulatora rozmytego Takagi-Sugeno. Poprawność proponowanych rozwiązań zweryfikowano symulacyjnie w sterowaniu układem wahadło odwrócone - wózek. Przeprowadzono również eksperymenty porównawcze z klasyczną siecią elementów ASE/ACE. Pokazano zalety i wady rozwiązania klasycznego i rozmytego.
The adaptation of reinforcement learning algorithm with the use of ASE/ACE elements for rule consequence learning of the Takagi-Sugeno fuzzy logic controller is proposed. The solution is applied to control of the cart-pole system and tested by computer simulations. The original neuronlike elements ASE/ACE are simulated as well. Advantages and disadvantages of the both approaches (fuzzy and classical) are demonstrated.
Źródło:: Pomiary Automatyka Kontrola; 2005, R. 51, nr 1, 1; 47-49
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 24.

Tytuł:: Simplification of deep reinforcement learning in traffic control using the Bonsai Platform
Uproszczenie uczenia się przez głębokie wzmocnienie w zarządzaniu ruchem z wykorzystaniem Platformy Bonsai
Autorzy:: Skuba, Michal
Janota, Aleš
Powiązania:: https://bibliotekanauki.pl/articles/2013058.pdf
Data publikacji:: 2020-12-31
Wydawca:: Uniwersytet Technologiczno-Humanistyczny im. Kazimierza Pułaskiego w Radomiu
Tematy:: control
deep reinforcement learning
model
simulation
traffic
sterowanie
uczenie w głębokim uczeniu przez wzmacnianie
symulacja
ruch drogowy
Opis:: The paper deals with the problem of traffic light control of road intersection. The authors use a model of a real road junction created in the AnyLogic modelling tool. For two scenarios, there are three simulation experiments performed – fixed time control, fixed time control after AnyLogic-based optimizations, and dynamic control obtained through the cooperation of the AnyLogic tool and the Bonsai platform, utilizing benefits of deep reinforcement learning. At present, there are trends to simplify machine learning processes as much as possible to make them accessible to practitioners with no artificial intelligence background and without the need to become data scientists. Project Bonsai represents an easy-to-use connector, that allows to use AnyLogic models connected to the Bonsai platform - a novel approach to machine learning without the need to set any hyper-parameters. Due to unavailability of real operational data, the model uses simulation data only, with presence and movement of vehicles only (no pedestrians). The optimization problem consists in minimizing the average time that agents (vehicles) must spend in the model, passing the modelled intersection. Another observed parameter is the maximum time of individual vehicles spent in the model. The authors share their practical, mainly methodological, experiences with the simulation process and indicate economic cost needed for training as well.
Artykuł dotyczy problemu sterowania sygnalizacją świetlną na skrzyżowaniach dróg. Autorzy wykorzystują model rzeczywistego węzła drogowego utworzony w narzędziu do modelowania AnyLogic. Dla dwóch scenariuszy wykonywane są trzy eksperymenty symulacyjne - sterowanie światłami sygnalizacyjnymi o stałym czasie działania, sterowanie światłami sygnalizacyjnymi o stałym czasie działania po optymalizacji w oparciu o AnyLogic, i sterowanie dynamiczne dzięki współpracy między AnyLogic i platformą Bonsai, wykorzystując korzyści płynące z uczenia się przez głębokie wzmocnienie. Obecnie istnieją tendencje do maksymalnego upraszczania procesów uczenia maszynowego, aby były dostępne dla praktyków bez doświadczenia w zakresie sztucznej inteligencji i bez konieczności zostania naukowcami danych. Project Bonsai to łatwe w obsłudze złącze, które pozwala na korzystanie z modeli AnyLogic podłączonych do platformy Bonsai - nowatorskie podejście do uczenia maszynowego bez konieczności ustawiania hiperparametrów. Ze względu na niedostępność rzeczywistych danych eksploatacyjnych model wykorzystuje tylko dane symulacyjne, tylko z obecnością i ruchem pojazdów (bez pieszych). Problem optymalizacji polega na zminimalizowaniu średniego czasu, jaki agenci (pojazdy) muszą spędzać w modelu, mijając modelowane skrzyżowanie. Kolejnym obserwowanym parametrem jest maksymalny czas przebywania poszczególnych pojazdów w modelu. Autorzy dzielą się praktycznymi, głównie metodologicznymi, doświadczeniami związanymi z procesem symulacji oraz wskazują koszty ekonomiczne potrzebne do uczenia.
Źródło:: Journal of Civil Engineering and Transport; 2020, 2, 4; 191-202
2658-1698
2658-2120
Pojawia się w:: Journal of Civil Engineering and Transport
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 25.

Tytuł:: Learning board evaluation function for Othello by hybridizing coevolution with temporal difference learning
Autorzy:: Szubert, M.
Jaśkowski, W.
Krawiec, K.
Powiązania:: https://bibliotekanauki.pl/articles/206175.pdf
Data publikacji:: 2011
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Tematy:: evolutionary computation
coevolutionary algorithms
reinforcement learning
memetic computing
game strategy learning
Opis:: Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms combining evolutionary algorithms and local search, benefit from complementarity of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL superiority compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. To further exploit CTDL potential, we extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to improve search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further study.
Źródło:: Control and Cybernetics; 2011, 40, 3; 805-831
0324-8569
Pojawia się w:: Control and Cybernetics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 26.

Tytuł:: Analiza możliwości wykorzystania algorytmów uczenia maszynowego w środowisku Unity
Analysis of the possibilities for using machine learning algorithms in the Unity environment
Autorzy:: Litwynenko, Karina
Plechawska-Wójcik, Małgorzata
Powiązania:: https://bibliotekanauki.pl/articles/1837823.pdf
Data publikacji:: 2021
Wydawca:: Politechnika Lubelska. Instytut Informatyki
Tematy:: uczenie ze wzmocnieniem
uczenie przez naśladowanie
Unity
reinforcement learning
imitation learning
Opis:: Algorytmy uczenia ze wzmocnieniem zyskują coraz większą popularność, a ich rozwój jest możliwy dzięki istnieniu narzędzi umożliwiających ich badanie. Niniejszy artykuł dotyczy możliwości zastosowania algorytmów uczenia maszynowego na platformie Unity wykorzystującej bibliotekę Unity ML-Agents Toolkit. Celem badania było porównanie dwóch algorytmów: Proximal Policy Optimization oraz Soft Actor-Critic. Zweryfikowano również możliwość poprawy wyników uczenia poprzez łączenie tych algorytmów z metodą uczenia przez naśladowanie Generative Adversarial Imitation Learning. Wyniki badania wykazały, że algorytm PPO może sprawdzić się lepiej w nieskomplikowanych środowiskach o nienatychmiastowym charakterze nagród, zaś dodatkowe zastosowanie GAIL może wpłynąć na poprawę skuteczności uczenia.
Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.
Źródło:: Journal of Computer Sciences Institute; 2021, 20; 197-204
2544-0764
Pojawia się w:: Journal of Computer Sciences Institute
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 27.

Tytuł:: Self-improving Q-learning based controller for a class of dynamical processes
Autorzy:: Musial, Jakub
Stebel, Krzysztof
Czeczot, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/1845515.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: process control
Q-learning algorithm
reinforcement learning
intelligent control
on-line learning
Opis:: This paper presents how Q-learning algorithm can be applied as a general-purpose self-improving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Źródło:: Archives of Control Sciences; 2021, 31, 3; 527-551
1230-2384
Pojawia się w:: Archives of Control Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 28.

Tytuł:: Self-improving Q-learning based controller for a class of dynamical processes
Autorzy:: Musial, Jakub
Stebel, Krzysztof
Czeczot, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/1845530.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: process control
Q-learning algorithm
reinforcement learning
intelligent control
on-line learning
Opis:: This paper presents how Q-learning algorithm can be applied as a general-purpose selfimproving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Źródło:: Archives of Control Sciences; 2021, 31, 3; 527-551
1230-2384
Pojawia się w:: Archives of Control Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 29.

Tytuł:: Wykorzystywanie programów uczenia w głębokim uczeniu przez wzmacnianie. O istocie rozpoczynania od rzeczy małych
Using Training Curriculum with Deep Reinforcement Learning. On the Importance of Starting Small
Autorzy:: KOZIARSKI, MICHAŁ
KWATER, KRZYSZTOF
WOŹNIAK, MICHAŁ
Powiązania:: https://bibliotekanauki.pl/articles/456567.pdf
Data publikacji:: 2018
Wydawca:: Uniwersytet Rzeszowski
Tematy:: głębokie uczenie przez wzmacnianie
uczenie przez transfer
uczenie się przez całe życie
proces uczenia
deep reinforcement learning
transfer learning
lifelong learning,
curriculum learning
Opis:: Algorytmy uczenia się przez wzmacnianie są wykorzystywane do rozwiązywania problemów o stale rosnącym poziomie złożoności. W wyniku tego proces uczenia zyskuje na złożoności i wy-maga większej mocy obliczeniowej. Wykorzystanie uczenia z przeniesieniem wiedzy może czę-ściowo ograniczyć ten problem. W artykule wprowadzamy oryginalne środowisko testowe i eks-perymentalnie oceniamy wpływ wykorzystania programów uczenia na głęboką odmianę metody Q-learning.
Reinforcement learning algorithms are being used to solve problems with ever-increasing level of complexity. As a consequence, training process becomes harder and more computationally demanding. Using transfer learning can partially elevate this issue by taking advantage of previ-ously acquired knowledge. In this paper we propose a novel test environment and experimentally evaluate impact of using curriculum with deep Q-learning algorithm.
Źródło:: Edukacja-Technika-Informatyka; 2018, 9, 2; 220-226
2080-9069
Pojawia się w:: Edukacja-Technika-Informatyka
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 30.

Tytuł:: Handling realistic noise in multi-agent systems with self-supervised learning and curiosity
Autorzy:: Szemenyei, Marton
Reizinger, Patrik
Powiązania:: https://bibliotekanauki.pl/articles/2147129.pdf
Data publikacji:: 2022
Wydawca:: Społeczna Akademia Nauk w Łodzi. Polskie Towarzystwo Sieci Neuronowych
Tematy:: deep reinforcement learning
multi-agent environment
autonomous driving
robot soccer
self-supervised learning
Opis:: Most reinforcement learning benchmarks – especially in multi-agent tasks – do not go beyond observations with simple noise; nonetheless, real scenarios induce more elaborate vision pipeline failures: false sightings, misclassifications or occlusion. In this work, we propose a lightweight, 2D environment for robot soccer and autonomous driving that can emulate the above discrepancies. Besides establishing a benchmark for accessible multiagent reinforcement learning research, our work addresses the challenges the simulator imposes. For handling realistic noise, we use self-supervised learning to enhance scene reconstruction and extend curiosity-driven learning to model longer horizons. Our extensive experiments show that the proposed methods achieve state-of-the-art performance, compared against actor-critic methods, ICM, and PPO.
Źródło:: Journal of Artificial Intelligence and Soft Computing Research; 2022, 12, 2; 135--148
2083-2567
2449-6499
Pojawia się w:: Journal of Artificial Intelligence and Soft Computing Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 31.

Tytuł:: Self-learning controller of active magnetic bearing based on CARLA method
Samo uczący sie sterownik aktywnego łozyslka magnetycznego oparty na metodzie CARLA
Autorzy:: Brezina, T.
Turek, M.
Pulchart, J.
Powiązania:: https://bibliotekanauki.pl/articles/152983.pdf
Data publikacji:: 2007
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: sterowanie aktywnego łożyska magnetycznego
active magnetic bearing control
continuous action reinforcement learning automata
Opis:: The active magnetic bearing control through analytically designed linear PD regulator, with parallel nonlinear compensation represented by automatic approximator is described in this contribution. Coefficient (parameter) values come from actions of Continuous Action Reinforcement Learning Automata (CARLAs). Influence of CARLAs parameters to learning is discussed. Parameters influence is proved by simulation study. It is shown that learning improvement can be reached by selecting appropriate parameters of learning.
W artykule przedstawiono sterowanie aktywnego łożyska magnetycznego za pomocą analitycznie dobranego regulatora PD z nieliniową kompensacją równoległą. Współczynniki kompensacji są wyznaczane automatycznie z użyciem metody CARLA (Continuous Action Reinforcement Automata). Zbadano wpływ parametrów metody na proces uczenia się kompensatora w oparciu o eksperymenty symulacyjne. Wykazano, że właściwy dobór parametrów metody prowadzi do poprawienia skuteczności procesu uczenia się.
Źródło:: Pomiary Automatyka Kontrola; 2007, R. 53, nr 1, 1; 6-9
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 32.

Tytuł:: Utilization of Deep Reinforcement Learning for Discrete Resource Allocation Problem in Project Management – a Simulation Experiment
Wykorzystanie uczenia ze wzmocnieniem w problemach dyskretnej alokacji zasobów w zarządzaniu projektami – eksperyment symulacyjny
Autorzy:: Wójcik, Filip
Powiązania:: https://bibliotekanauki.pl/articles/2179629.pdf
Data publikacji:: 2022
Wydawca:: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Tematy:: reinforcement learning (RL)
operations research
management
optimisation
uczenie ze wzmocnieniem
badania operacyjne
zarządzanie
optymalizacja
Opis:: This paper tests the applicability of deep reinforcement learning (DRL) algorithms to simulated problems of constrained discrete and online resource allocation in project management. DRL is an extensively researched method in various domains, although no similar case study was found when writing this paper. The hypothesis was that a carefully tuned RL agent could outperform an optimisation-based solution. The RL agents: VPG, AC, and PPO, were compared against a classic constrained optimisation algorithm in trials: “easy”/”moderate”/”hard” (70/50/30% average project success rate). Each trial consisted of 500 independent, stochastic simulations. The significance of the differences was checked using a Welch ANOVA on significance level alpha = 0.01, followed by post hoc comparisons for false-discovery control. The experiment revealed that the PPO agent performed significantly better in moderate and hard simulations than the optimisation approach and other RL methods.
W artykule zbadano stosowalność metod głębokiego uczenia ze wzmocnieniem (DRL) do symulowanych problemów dyskretnej alokacji ograniczonych zasobów w zarządzaniu projektami. DRL jest obecnie szeroko badaną dziedziną, jednak w chwili przeprowadzania niniejszych badań nie natrafiono na zbliżone studium przypadku. Hipoteza badawcza zakładała, że prawidłowo skonstruowany agent RL będzie w stanie uzyskać lepsze wyniki niż klasyczne podejście wykorzystujące optymalizację. Dokonano porównania agentów RL: VPG, AC i PPO z algorytmem optymalizacji w trzech symulacjach: „łatwej”/„średniej”/ „trudnej” (70/50/30% średnich szans na sukces projektu). Każda symulacja obejmowała 500 niezależnych, stochastycznych eksperymentów. Istotność różnic porównano testem ANOVA Welcha na poziomie istotności α = 0.01, z następującymi po nim porównaniami post hoc z kontrolą poziomu błędu. Eksperymenty wykazały, że agent PPO uzyskał w najtrud- niejszych symulacjach znacznie lepsze wyniki niż metoda optymalizacji i inne algorytmy RL.
Źródło:: Informatyka Ekonomiczna; 2022, 1; 56-74
1507-3858
Pojawia się w:: Informatyka Ekonomiczna
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 33.

Tytuł:: Sumienie maszyny? Sztuczna inteligencja i problem odpowiedzialności moralnej
The Conscience of a Machine? Artificial Intelligence and the Problem of Moral Responsibility
Autorzy:: Wieczorek, Krzysztof Tomasz
Jędrzejko, Paweł
Powiązania:: https://bibliotekanauki.pl/articles/1912551.pdf
Data publikacji:: 2021-09-03
Wydawca:: Wydawnictwo Uniwersytetu Śląskiego
Tematy:: sztuczna inteligencja
etyka
reinforcement learning
autonomia decyzyjna
artificial intelligence
ethics
decision-making autonomy
Opis:: Przyspieszający postęp w dziedzinie inteligentnych technologii rodzi nowe wyzwania etyczne, z którymi w dłuższej lub krótszej perspektywie ludzkość będzie musiała się zmierzyć. Nieuniknionym elementem owego postępu jest rosnąca autonomia w zakresie podejmowania decyzji przez maszyny i systemy, nienadzorowane bezpośrednio przez człowieka. Co najmniej niektóre z tych decyzji będą rodzić konflikty i dylematy moralne. Już dziś warto się zastanowić nad tym, jakie środki są niezbędne, by przyszłe autonomiczne, samouczące i samoreplikujące się obiekty, wyposażone w sztuczną inteligencję i zdolne do samodzielnego działania w dużym zakresie zmienności warunków zewnętrznych, wyposażyć w specyficzny rodzaj inteligencji etycznej. Problem, z którym muszą się zmierzyć zarówno konstruktorzy, jak i użytkownicy tworów obdarzonych sztuczną inteligencją, polega na konieczności optymalnego wyważenia racji, potrzeb i interesów między obiema stronami ludzko-nieludzkiej interakcji. W sytuacji rosnącej autonomii maszyn przestaje bowiem wystarczać etyka antropocentryczna. Potrzebny jest nowy, poszerzony i zmodyfikowany model etyki, który pozwoli przewidzieć i objąć swoim zakresem dotychczas niewystępujący obszar równorzędnych relacji człowieka i maszyny. Niektórym aspektom tego zagadnienia poświęcony jest niniejszy artykuł.
The ever-accelerating progress in the area of smart technologies gives rise to new ethical challenges, which humankind will sooner or later have to face. An inevitable component of this progress is the increase in the autonomy of the decision-making processes carried out by machines and systems functioning without direct human control. At least some of these decisions will generate conflicts and moral dilemmas. It is therefore worth the while to reflect today upon the measures that need to be taken in order to endow the autonomous, self-learning and self-replicating entities – products equipped with artificial intelligence and capable of independent operation in a wide variety of external conditions and circumstances – with a unique kind of ethical intelligence. At the core of the problem, which both the designers and the users of entities bestowed with artificial intelligence must eventually face, lies the question of how to attain the optimal balance between the goals, needs and interests of both sides of the human-non-human interaction. It is so, because in the context of the expansion of the autonomy of the machines, the anthropocentric model of ethics does no longer suffice. It is therefore necessary to develop a new, extended and modified, model of ethics: a model which would encompass the whole, thus far non-existent, area of equal relations between the human and the machine, and which would allow one to predict its dynamics. The present article addresses some of the aspects of this claim.
Źródło:: ER(R)GO: Teoria – Literatura – Kultura; 2021, 42; 15-34
1508-6305
2544-3186
Pojawia się w:: ER(R)GO: Teoria – Literatura – Kultura
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 34.

Tytuł:: Stabilizer design of PSS3B based on the KH algorithm and Q-Learning for damping of low frequency oscillations in a single-machine power system
Autorzy:: Mohamadi, Farshid
Sedaghati, Alireza
Powiązania:: https://bibliotekanauki.pl/articles/41190034.pdf
Data publikacji:: 2023
Wydawca:: Politechnika Warszawska, Instytut Techniki Cieplnej
Tematy:: 3-band power system stabilize
reinforcement learning
Q-learning
system zasilania
uczenie przez wzmacnianie
Opis:: The aim of this study is to use the reinforcement learning method in order to generate a complementary signal for enhancing the performance of the system stabilizer. The reinforcement learning is one of the important branches of machine learning on the area of artificial intelligence and a general approach for solving the Marcov Decision Process (MDP) problems. In this paper, a reinforcement learning-based control method, named Q-learning, is presented and used to improve the performance of a 3-Band Power System Stabilizer (PSS3B) in a single-machine power system. For this end, we first set the parameters of the 3-band power system stabilizer by optimizing the eigenvalue-based objective function using the new optimization KH algorithm, and then its efficiency is improved using the proposed reinforcement learning algorithm based on the Q-learning method in real time. One of the fundamental features of the proposed reinforcement learning-based stabilizer is its simplicity and independence on the system model and changes in the working points of operation. To evaluate the efficiency of the proposed reinforcement learning-based 3-band power system stabilizer, its results are compared with the conventional power system stabilizer and the 3-band power system stabilizer designed by the use of the KH algorithm under different working points. The simulation results based on the performance indicators show that the power system stabilizer proposed in this study underperform the two other methods in terms of decrease in settling time and damping of low frequency oscillations.
Źródło:: Journal of Power Technologies; 2023, 103, 4; 230-242
1425-1353
Pojawia się w:: Journal of Power Technologies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 35.

Tytuł:: A strategy learning model for autonomous agents based on classification
Autorzy:: Śnieżyński, B.
Powiązania:: https://bibliotekanauki.pl/articles/330672.pdf
Data publikacji:: 2015
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: autonomous agents
strategy learning
supervised learning
classification
reinforcement learning
czynnik niezależny
uczenie nadzorowane
uczenie ze wzmocnieniem
Opis:: In this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer–pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2015, 25, 3; 471-482
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 36.

Tytuł:: Markov Decision Process based Model for Performance Analysis an Intrusion Detection System in IoT Networks
Autorzy:: Kalnoor, Gauri
Gowrishankar, -
Powiązania:: https://bibliotekanauki.pl/articles/1839336.pdf
Data publikacji:: 2021
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: DDoS
intrusion detection
IoT
machine learning
Markov decision process
MDP
Q-learning
NSL-KDD
reinforcement learning
Opis:: In this paper, a new reinforcement learning intrusion detection system is developed for IoT networks incorporated with WSNs. A research is carried out and the proposed model RL-IDS plot is shown, where the detection rate is improved. The outcome shows a decrease in false alarm rates and is compared with the current methodologies. Computational analysis is performed, and then the results are compared with the current methodologies, i.e. distributed denial of service (DDoS) attack. The performance of the network is estimated based on security and other metrics.
Źródło:: Journal of Telecommunications and Information Technology; 2021, 3; 42-49
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 37.

Tytuł:: O doborze reguł sterowania dla regulatora rozmytego
About collecting of control for a fuzzy logic controller
Autorzy:: Wiktorowicz, K.
Zajdel, R.
Powiązania:: https://bibliotekanauki.pl/articles/156306.pdf
Data publikacji:: 2005
Wydawca:: Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:: sterowanie rozmyte
sieci neuronowe
uczenie ze wzmocnieniem
fuzzy control
neural networks
reinforcement learning
stability
quality
Opis:: W pracy scharakteryzowano problem doboru reguł sterowania dla regulatora rozmytego. Omówiono metody pozyskiwania reguł za pomocą sieci neuronowej uczonej metodą z nauczycielem i ze wzmocnieniem. Przedstawiono zagadnienie badania stabilności i jakości zaprojektowanego układu. Omawiane problemy zilustrowano przykładowymi wynikami badań.
In the paper the problem of collecting of control rules a fuzzy logic controller is characterised. Two methods of generating of rules using neural network are described: supervised learning and reinforcement learning. the problem of stability and quality analysis is presented. The considerations are illustrated by examples.
Źródło:: Pomiary Automatyka Kontrola; 2005, R. 51, nr 1, 1; 44-46
0032-4140
Pojawia się w:: Pomiary Automatyka Kontrola
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 38.

Tytuł:: Motor Control: Neural Models and Systems Theory
Autorzy:: Doya, K.
Kimura, H.
Miyamura, A.
Powiązania:: https://bibliotekanauki.pl/articles/908323.pdf
Data publikacji:: 2001
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: adaptacyjny układ sterowania
model wielokrotny
inverse model
adaptive control
cerebellum
reinforcement learning
basal ganglia
multiple models
Opis:: In this paper, we introduce several system theoretic problems brought forward by recent studies on neural models of motor control. We focus our attention on three topics: (i) the cerebellum and adaptive control, (ii) reinforcement learning and the basal ganglia, and (iii) modular control with multiple models. We discuss these subjects from both neuroscience and systems theory viewpoints with the aim of promoting interplay between the two research communities.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2001, 11, 1; 77-104
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 39.

Tytuł:: Efficient learning variable impedance control for industrial robots
Autorzy:: Li, C.
Zhang, Z.
Xia, G.
Xie, X.
Zhu, Q.
Powiązania:: https://bibliotekanauki.pl/articles/200716.pdf
Data publikacji:: 2019
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: variable impedance control
reinforcement learning
efficient
Gaussian process
industrial robots
impedancja
poprawa efektywności
wydajność
model Gaussa
roboty przemysłowe
Opis:: Compared with the robots, humans can learn to perform various contact tasks in unstructured environments by modulating arm impedance characteristics. In this article, we consider endowing this compliant ability to the industrial robots to effectively learn to perform repetitive force-sensitive tasks. Current learning impedance control methods usually suffer from inefficiency. This paper establishes an efficient variable impedance control method. To improve the learning efficiency, we employ the probabilistic Gaussian process model as the transition dynamics of the system for internal simulation, permitting long-term inference and planning in a Bayesian manner. Then, the optimal impedance regulation strategy is searched using a model-based reinforcement learning algorithm. The effectiveness and efficiency of the proposed method are verified through force control tasks using a 6-DoFs Reinovo industrial manipulator.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2019, 67, 2; 201-212
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 40.

Tytuł:: A hybrid mathematical model for an optimal border closure policy during a pandemic
Autorzy:: Lazebnik, Teddy
Shami, Labib
Bunimovich-Mendrazitsky, Svetlana
Powiązania:: https://bibliotekanauki.pl/articles/24336021.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: health care
spatio temporal SIR model
international biotourism policy
multiagent reinforcement learning
opieka zdrowotna
polityka bioturystyczna
uczenie ze wzmocnieniem
Opis:: During a global health crisis, a country’s borders are a weak point through which carriers from countries with high morbidity rates can enter, endangering the health of the local community and undermining the authorities’ efforts to prevent the spread of the pathogen. Therefore, most countries have adopted some level of border closure policies as one of the first steps in handling pandemics. However, this step involves a significant economic loss, especially for countries that rely on tourism as a source of income. We developed a pioneering model to help decision-makers determine the optimal border closure policies during a health crisis that minimize the magnitude of the outbreak and maximize the revenue of the tourism industry. This approach is based on a hybrid mathematical model that consists of an epidemiological sub-model with tourism and a pandemic-focused economic sub-model, which relies on elements from the field of artificial intelligence to provide policymakers with a data-driven model for a border closure strategy for tourism during a global pandemic.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 4; 583--601
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 41.

Tytuł:: An automated driving strategy generating method based on WGAIL–DDPG
Autorzy:: Zhang, Mingheng
Wan, Xing
Gang, Longhui
Lv, Xinfei
Wu, Zengwen
Liu, Zhaoyang
Powiązania:: https://bibliotekanauki.pl/articles/2055167.pdf
Data publikacji:: 2021
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: automated driving system
deep learning
deep reinforcement learning
imitation learning
deep deterministic policy gradient
system jezdny
uczenie głębokie
uczenie przez naśladowanie
Opis:: Reliability, efficiency and generalization are basic evaluation criteria for a vehicle automated driving system. This paper proposes an automated driving decision-making method based on the Wasserstein generative adversarial imitation learning–deep deterministic policy gradient (WGAIL–DDPG(λ)). Here the exact reward function is designed based on the requirements of a vehicle’s driving performance, i.e., safety, dynamic and ride comfort performance. The model’s training efficiency is improved through the proposed imitation learning strategy, and a gain regulator is designed to smooth the transition from imitation to reinforcement phases. Test results show that the proposed decision-making model can generate actions quickly and accurately according to the surrounding environment. Meanwhile, the imitation learning strategy based on expert experience and the gain regulator can effectively improve the training efficiency for the reinforcement learning model. Additionally, an extended test also proves its good adaptability for different driving conditions.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2021, 31, 3; 461--470
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 42.

Tytuł:: A hybrid control strategy for a dynamic scheduling problem in transit networks
Autorzy:: Liu, Zhongshan
Yu, Bin
Zhang, Li
Wang, Wensi
Powiązania:: https://bibliotekanauki.pl/articles/2172126.pdf
Data publikacji:: 2022
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: service reliability
transit network
proactive control method
deep reinforcement learning
hybrid control strategy
niezawodność usług
sieć tranzytowa
uczenie głębokie
kontrola hybrydowa
Opis:: Public transportation is often disrupted by disturbances, such as the uncertain travel time caused by road congestion. Therefore, the operators need to take real-time measures to guarantee the service reliability of transit networks. In this paper, we investigate a dynamic scheduling problem in a transit network, which takes account of the impact of disturbances on bus services. The objective is to minimize the total travel time of passengers in the transit network. A two-layer control method is developed to solve the proposed problem based on a hybrid control strategy. Specifically, relying on conventional strategies (e.g., holding, stop-skipping), the hybrid control strategy makes full use of the idle standby buses at the depot. Standby buses can be dispatched to bus fleets to provide temporary or regular services. Besides, deep reinforcement learning (DRL) is adopted to solve the problem of continuous decision-making. A long short-term memory (LSTM) method is added to the DRL framework to predict the passenger demand in the future, which enables the current decision to adapt to disturbances. The numerical results indicate that the hybrid control strategy can reduce the average headway of the bus fleet and improve the reliability of bus service.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2022, 32, 4; 553--567
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 43.

Tytuł:: Enhancements of Fuzzy Q-Learning algorithm
Rozszerzenia algorytmu Fuzzy Q-Learning
Autorzy:: Głowaty, G.
Powiązania:: https://bibliotekanauki.pl/articles/305545.pdf
Data publikacji:: 2005
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: modele rozmyte
uczenie ze wzmocnieniem
Q-Learning
automatyczne tworzenie modeli rozmytych
fuzzy models
reinforcement learning
Q-learning
automatic generation of fuzzy models
Opis:: Fuzzy Q-Learning algorithm combines reinforcement learning techniques with fuzzy modelling. It provides a flexible solution for automatic discovery of rules for fuzzy systems in the process of reinforcement learning. In this paper we propose several enhancements to the original algorithm to make it more performant and more suitable for problems with continuous-input continuous-output space. Presented improvements involve generalization of the set of possible rule conclusions. The aim is not only to automatically discover an appropriate rule-conclusions assignment, but also to automatically define the actual conclusions set given the all possible rules conclusions. To improve algorithm performance when dealing with environments with inertness, a special rule selection policy is proposed.
Algorytm Fuzzy Q-Learning pozwala na automatyczny dobór reguł systemu rozmytego z użyciem technik uczenia ze wzmocnieniem. W niniejszym artykule zaproponowana została zmodyfikowana wersja oryginalnego algorytmu. Charakteryzuje się ona lepszą wydajnością działania w systemach z ciągłymi przestrzeniami wejść i wyjść. Algorytm rozszerzono o możliwość automatycznego tworzenia zbioru potencjalnych konkluzji reguł z podanego zbioru wszystkich możliwych konkluzji. Zaproponowano także nową procedurę wyboru reguł dla polepszenia prędkości działania w systemach z bezwładnością.
Źródło:: Computer Science; 2005, 7; 77-87
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 44.

Tytuł:: Influence of IQT on research in ICT
Autorzy:: Bednarski, Bogdan J.
Lepak, Łukasz E.
Łyskawa, Jakub J.
Pieńczuk, Paweł
Rosoł, Maciej
Romaniuk, Ryszard S.
Powiązania:: https://bibliotekanauki.pl/articles/2055259.pdf
Data publikacji:: 2022
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: ICT
control theory
IQT
Information Quantum Technologies
Quantum 2.0
applications of IQT
quantum systems
qubit neural networks
quantum time series forecasting;
Quantum Reinforcement Learning
Opis:: This paper is written by a group of Ph.D. students pursuing their work in different areas of ICT, outside the direct area of Information Quantum Technologies IQT. An ambitious task was undertaken to research, by each co-author, a potential practical influence of the current IQT development on their current work. The research of co-authors span the following areas of ICT: CMOS for IQT, QEC, quantum time series forecasting, IQT in biomedicine. The intention of the authors is to show how quickly the quantum techniques can penetrate in the nearest future other, i.e. their own, areas of ICT.
Źródło:: International Journal of Electronics and Telecommunications; 2022, 68, 2; 259--266
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 45.

Tytuł:: Concept of a self-learning workplace cell for worker assistance while collaboration with a robot within the self-adapting-production-planning-system
Koncepcja samouczącego się stanowiska pracy dla wspierania pracownika przy współpracy z robotem w układzie samoadaptacja-produkcja-planowanie
Autorzy:: Ender, Johanna
Wagner, Jan Cetric
Kunert, Georg
Guo, Fang Bin
Larek, Roland
Pawletta, Thorsten
Powiązania:: https://bibliotekanauki.pl/articles/407864.pdf
Data publikacji:: 2019
Wydawca:: Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:: human-robot collaboration
human factor
post-optimised reinforcement learning algorithm
self-adapting-production-planning-system
współpraca człowiek-robot
czynnik ludzki
zoptymalizowany poprawiony algorytm uczenia się
układ samoadaptacja-produkcja-planowanie
Opis:: For some time, the focus of past research on industrial workplace designs has been the optimization of processes from the technological point of view. Since human workers have to work within this environment the design process must regard Human Factor needs. The operators are under additional stress due to the range of high dynamic processes and due to the integration of robots and autonomous operating machines. There have been few studies on how Human Factors influence the design of workplaces for Human-Robot Collaboration (HRC). Furthermore, a comprehensive, systematic and human-centred design solution for industrial workplaces particularly considering Human Factor needs within HRC is widely uncertain and a specific application with reference to production workplaces is missing. The research findings described in this paper aim the optimization of workplaces for manual production and maintenance processes with respect to the workers within HRC. In order to increase the acceptance of integration of human-robot teams, the concept of the Assisting-Industrial-Workplace-System (AIWS) was developed. As a flexible hybrid cell for HRC integrated into a Self-Adapting-Production-Planning-System (SAPPS) assists the worker while interaction.
Wcześniejsze badania nad projektami przemysłowych miejsc pracy koncentrowały się od pewnego czasu na optymalizacji procesów z technologicznego punktu widzenia. Ze względu na konieczność pracy ludzi w takim środowisku, proces projektowania musi uwzględniać potrzeby czynnika ludzkiego. Operatorzy znajdują się pod dodatkowym obciążeniem ze względu na zakres procesów o wysokiej dynamice, integrację robotów i autonomicznych maszyn roboczych. Stosunkowo niewiele badań dotyczy wpływu czynników ludzkich na projektowanie miejsc pracy na potrzeby układów Human-Robot Collaboration (HRC). Co więcej, wszechstronne, systematyczne i ukierunkowane na człowieka rozwiązanie projektowe dla przemysłowych zakładów pracy, szczególnie uwzględniające potrzeby czynnika ludzkiego w HRC, jest szeroko niepewne i brak jest konkretnego zastosowania w odniesieniu do miejsc pracy w produkcji. Opisane w artykule wyniki badań mają na celu optymalizację miejsc pracy dla ręcznych procesów produkcji i utrzymania ruchu, w odniesieniu do pracowników w HRC. W celu zwiększenia akceptacji integracji zespołów ludzko-robotycznych opracowano koncepcję systemu wspomagania miejsca pracy (ang. Assisting-Industrial-Workplace-System, AIWS). Jako elastyczna komórka hybrydowa dla HRC zintegrowana z Samo-Adaptacyjnym Systemem Planowania Produkcji (ang. Self-Adapting-Production-Planning-System, SAPPS) pomaga pracownikowi podczas interakcji.
Źródło:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2019, 9, 4; 4-9
2083-0157
2391-6761
Pojawia się w:: Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 46.

Tytuł:: Building computer vision systems using machine learning algorithms
Autorzy:: Boyko, N.
Dokhniak, B.
Korkishko, V.
Powiązania:: https://bibliotekanauki.pl/articles/411183.pdf
Data publikacji:: 2018
Wydawca:: Polska Akademia Nauk. Oddział w Lublinie PAN
Tematy:: training with reinforcement
Q-Learning
neural networks
Markov environment
Opis:: This article is devoted to the algorithm of training with reinforcement (reinforcement learning). This article will cover various modifications of the Q-Learning algorithm, along with its techniques, which can accelerate learning using neural networks. We also talk about different ways of approximating the tables of this algorithm, consider its implementation in the code and analyze its behavior in different environments. We set the optimal parameters for its implementation, and we will evaluate its performance in two parameters: the number of necessary neural network weight corrections and quality of training.
Źródło:: ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes; 2018, 7, 2; 9-14
2084-5715
Pojawia się w:: ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "reinforcement learning" wg kryterium: Wszystkie pola

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język