2D ML Fighter

A novel comparative study of DQN and PPO algorithms in a 2D fighting game using Unity ML-Agents

INTRODUCTION

Artificial Intelligence (AI) applications are expanding as their needs increase in the modern world (Jiang, 2020). AI is used in many areas, such as object recognition, medical research, translation, speech recognition, personalisation, etc. Deep reinforcement learning (DRL) in video games is also experiencing rapid growth in AI. Video games introduce various kinds of challenges in their nature of complexity, which includes interactions between agents and their environment, as well as those involving players and agents, which are often unpredictable. AI has proven to be more skilled than most human players in games that follow a turn-based format, such as chess and card games (Liang, Li, 2022). These games are considered games with a fixed environment; however, the same level of performance has yet to be achieved in fighting games. Fighting games present significant challenges due to their limited decision-making time, extensive decision space, and diverse strategies. Researchers should customise each fighting game genre’s design and development to overcome these challenges.
Despite the advancement of DRL in video games, more information is needed on the comparative analysis of different DRL algorithms in the 2D fighting game genre. Although Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) have demonstrated potential in other applications, such as those documented by (Yoon, Kim, 2017) and (Liang et al., 2021), there remains a need for an extensive comparison of their performance within this area. This presents an excellent research opportunity to significantly widen the understanding of how these algorithms perform in highly dynamic environments regarding learning efficiency and win rate.
This research proposal proposes the design, development, and comparative analysis of AI for the 2D fighting game using DQN and PPO algorithms. The research methodology proposed by Ramlan et al. (2021) will be used as a baseline to design a 2D game and to model AI agents. All the gameplay mechanics will be implemented using the Unity game engine by utilising its capabilities, such as the physics system, high-level functionalities and rendering system. Unity’s open-source ML-Agent toolkit provides seamless integration to develop and test advanced RL algorithms such as DQN and PPO. A new reward-shaping-based self-play method proposed by Oh et al. 2022 will be used to train AI agents.
Main Research question:
Despite the advances, a comparative analysis of different DRL algorithms, particularly DQN and PPO, in the 2D fighting games genre needs to be explored.
Sub-research questions:
• How to develop a one-on-one 2D fighting AI agent using the Unity game engine
• How to train the AI agents with DQN and PPO algorithms separately using the Unity ML-Agents toolkit.
• How can the performance of these algorithms be compared in terms of learning efficiency and win rate?
Aim:
This research centres on the comparative performance analysis of DQN and PPO algorithms in a challenging 2D fighting game environment. The research aims to address the gap present in the current literature review. Such a comparison is essential in the 2D fighting game genre to identify how the different DRL algorithms perform regarding learning efficiency and win rate in an overly complex, real-time decision-making environment. This project’s primary artefact will be a 2D fighting game with trained AI agents using DQN and PPO, which will be used as a testbed to make comparative performance analyses regarding DRL algorithms. Even though this research contributes to AI in gaming, it offers insight into how to design and develop a complete 2D fighting game in the Unity game engine.
Objectives:
To achieve the research aims, the project starts with designing and implementing a 2D dynamic fighting game environment using the Unity game engine, which serves as AI agents' training ground. The next step is to implement and train DQN and PPO algorithms separately by utilising the Unity ML-Agents toolkit. The agent training process will be done through hyperparameter tuning and the reward-shaping-based self-play method. After training, performance metrics such as learning efficiency and win rate will be established to compare each algorithm