Replica Exchange For Multiple-Environment Reinforcement Learning

Glusco, Dmitri

Домівка
→
Students Research & Project Works | Роботи студентів
→
Факультет прикладних наук
→
Освітня програма наук про дані
→
2020
→
Перегляд матеріалів

dc.contributor.author	Glusco, Dmitri
dc.date.accessioned	2020-02-24T15:58:38Z
dc.date.available	2020-02-24T15:58:38Z
dc.date.issued	2020
dc.identifier.citation	Glusco, Dmitri. Replica Exchange For Multiple-Environment Reinforcement Learning : Master Thesis : manuscript rights / Dmitri Glusco ; Supervisor Dr. Mykola Maksymenko ; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2020. – 54 p. : ill.	uk
dc.identifier.uri	http://er.ucu.edu.ua/handle/1/2044
dc.language.iso	en	uk
dc.subject	Multiple-Environment Reinforcement Learning	uk
dc.subject	Replica Exchange	uk
dc.subject	Noise Learning	uk
dc.title	Replica Exchange For Multiple-Environment Reinforcement Learning	uk
dc.type	Preprint	uk
dc.status	Публікується вперше	uk
dc.description.abstracten	In this project (Dmitri Glusco, 2019), we treat the Reinforcement Learning problem of Exploration vs. Exploitation. The problem can be rephrased in terms of generalization and overﬁtting or efﬁcient learning. To face the problem we decided to combine the techniques from different researches: we introduce noise as an environment’s characteristics (Packer et al., 2018); create multiple Reinforcement Learning agents and environments setup to train in parallel and interact within each other (Jaderberg et al., 2017); use parallel tempering approach to initialize environments with different temperatures (noises) and perform exchanges using Metropolis-Hastings criterion (Pushkarov et al., 2019). We implemented multi-agent architecture with parallel tempering approach based on two different Reinforcement Learning agent algorithms - Deep Q Network and Advantage Actor-Critic - and environment wrapper of the OpenAI Gym (Gym: A toolkit for developing and comparing reinforcement learning algorithms) environment for noise addition. We used the CartPole environment to run multiple experiments with three different types of exchanges: no exchange, random exchange, smart exchange according to Metropolis-Hastings rule. We implemented aggregation functionality to gather the results of all the experiments and visualize them with charts for analysis. Experiments showed that a parallel tempering approach with multiple environments with different noise level can improve the performance of the agent under speciﬁc circumstances. At the same time, results raised new questions that should be addressed to fully understand the picture of the implemented approach.	uk