PlaNet Algorithm in a Multi-Agent Environment (BSc Thesis)
Summary: Author presents an evaluation of a state of the art model-based reinforcement learning algorithm Deep Planning Network (PlaNet). Although capable of reaching high accuracy and learning optimal behaviour from raw pixels while preserving sample efficiency, PlaNet authors have only evaluated the algorithm on 2D control problems. As multi-agent property is a critical aspect of the real world, author of this thesis conducted a series of evaluations of PlaNet on multi-agent game environments. Experimental results showed that PlaNet reaches good benchmark scores and sample efficiency in limited visual variance multi-agent environments. On the other hand, author verified that the algorithm underperforms in large scale multi-agent environments, mispredicting behaviour of other agents. Also, in environments where small size, often some pixel width objects are important, the algorithm loses such details in future state predictions. In addition, author shares experimental findings of tuning PlaNet's latent overshooting regularization mechanism and the divergence scale to increase model's performance.
Supervisor: Linas Petkevičius