Replica Exchange For Multiple-Environment Reinforcement Learning

Показати скорочений опис матеріалу

dc.contributor.author Glusco, Dmitri
dc.date.accessioned 2020-02-24T15:58:38Z
dc.date.available 2020-02-24T15:58:38Z
dc.date.issued 2020
dc.identifier.citation Glusco, Dmitri. Replica Exchange For Multiple-Environment Reinforcement Learning : Master Thesis : manuscript rights / Dmitri Glusco ; Supervisor Dr. Mykola Maksymenko ; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2020. – 54 p. : ill. uk
dc.identifier.uri http://er.ucu.edu.ua/handle/1/2044
dc.language.iso en uk
dc.subject Multiple-Environment Reinforcement Learning uk
dc.subject Replica Exchange uk
dc.subject Noise Learning uk
dc.title Replica Exchange For Multiple-Environment Reinforcement Learning uk
dc.type Preprint uk
dc.status Публікується вперше uk
dc.description.abstracten In this project (Dmitri Glusco, 2019), we treat the Reinforcement Learning problem of Exploration vs. Exploitation. The problem can be rephrased in terms of generalization and overfitting or efficient learning. To face the problem we decided to combine the techniques from different researches: we introduce noise as an environment’s characteristics (Packer et al., 2018); create multiple Reinforcement Learning agents and environments setup to train in parallel and interact within each other (Jaderberg et al., 2017); use parallel tempering approach to initialize environments with different temperatures (noises) and perform exchanges using Metropolis-Hastings criterion (Pushkarov et al., 2019). We implemented multi-agent architecture with parallel tempering approach based on two different Reinforcement Learning agent algorithms - Deep Q Network and Advantage Actor-Critic - and environment wrapper of the OpenAI Gym (Gym: A toolkit for developing and comparing reinforcement learning algorithms) environment for noise addition. We used the CartPole environment to run multiple experiments with three different types of exchanges: no exchange, random exchange, smart exchange according to Metropolis-Hastings rule. We implemented aggregation functionality to gather the results of all the experiments and visualize them with charts for analysis. Experiments showed that a parallel tempering approach with multiple environments with different noise level can improve the performance of the agent under specific circumstances. At the same time, results raised new questions that should be addressed to fully understand the picture of the implemented approach. uk


Долучені файли

Даний матеріал зустрічається у наступних зібраннях

Показати скорочений опис матеріалу

Пошук


Перегляд

Мій обліковий запис