Q-Learning AI

By far one of the most interesting concepts in AI to me is that of machine learning. The possible applications for this form of algorithm are endless, and I was excited to get the chance to learn more about it. Machine learning is a vast and complex topic, as in many ways it's one of the closest things we have to real artificial intelligence.

There are multiple different approaches that can be taken when working with machine learning algorithms. These are supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning

The computer is given example inputs and outputs, and attempts to create a rule that maps the two together

Unsupervised Learning
The computer is given no guide and the goal is to find the structure in the environment

Reinforced Learning

The computer is tasked to perform a certain goal in a dynamic environment. The computer receives rewards based on the actions it takes depending on whether or not the actions brought the computer closer to the goal.

My goal was to create a project that would allow me to further understand the base concepts of machine learning and apply them in a simple and easy to understand setting. I decided to do this by using a simple method of reinforced learning called Q-Learning to teach an AI to play the game Snake.

Q-Learning

Quality learning, or Q-Learning, is a simple method of reinforcement learning that calculates a reward every time the computer takes an action and saves it to a table of values called a Q-Table. Every time the computer takes an action, it calculates whether this action brought them closer to their goal or not. Then it applies a positive or negative reward to that specific action in that specific context so that the next time the computer is in the same situation, it will choose the action with the highest value. Jason Lee explained it well in his article on Q-Learning that I based much of my project off of. He described it as akin to training a dog. When they do something good, you give them a positive reward. Something bad, and you give them a negative one. This allows them to make the right choice in the future.

My Game

To make my game, I first searched for examples of premade snake games I could base my game off of. Once I had a functioning player-controlled snake game in Unity, I needed to add machine learning. I searched for something that I could learn from, but all the code I could find was in Python, so it didn't really mesh well with my C# Unity environment. I decided that my goal was to take the functioning Python algorithms and treat them as pseudocode to recreate them in Unity. It is a fairly simple Snake game, in which every turn the AI finds where the food is in relation to the snake and chooses the best action to take accordingly.

Game States and Q-Tables

Before I get into the code, I need to quickly explain how the

Q-Table works. Here is a sample of the first few values in the table.

The data within the parentheses represents what is called the

game state. The first character represents the x position and can

be 0, 1, or N if the food is left, right, or on the same level as the

snake respectively.

The second character represents the y position and can be

2, 3, or N if the food is above, below, or on the same level respectively.

The 4 digits following that are 0-1, depending on which of the four sides of the snake is surrounded.

The data after the parenthesis represents the Q-Values of each of the 4 actions possible for the snake at the current game state. They represent the values of left, right, up, and down respectively, and the snake will choose whichever action is the highest. They are 0 now because the game hasn't been run yet, so the AI hasn't learned anything.

The Code

The basic loop for the AI is quite simple. Every time the function Move is called in Update, the game runs the Act function to get the action that the snake will take.

In Act, the algorithm gets the current state of the game, and gets the state string, which is the data that allows the correct Q-Values to be found in the Q-Table dictionary.

For the first 100 turns, the AI has a 10% chance

of choosing a random move, represented by

epsilon. This is so that the AI can make random

actions and receive the correct rewards to learn.

The game also stores all prior states and

their actions taken so that they can be

compared to future ones.

Finally, after taking the action, the Q-Values are updated. It gets the current state and the previous state, compares them, and then depending on if the snake is closer to or father from the food, gives the correct reward.

The reward is calculated by using the Bellman equation. Essentially the equation calculates the reward for the current action, as well as estimates the reward for future actions. Every single Q-learning algorithm uses this, and it was explained best to me by

Chathurangi Shyalika in their article discussing simple Q-learning.

Once all this is done, the algorithm does it all over again, retaining the data after every move, so that every game the snake improves.

Admittedly my algorithm is far from perfect. I was unable to get the snake to recognize it's own tail, and because of that, despite being able to locate and seek out the food, it will very often die quickly once it grows longer.

Applicability

There are quite a few different applications for machine learning and Q-Learning in practice. This kind of learning algorithm can be used for things like learning what kind of news stories or youtube videos you should be recommended. Depending on how complex and detailed the data is, it can be applied to learn how to do almost any task. However, that form of learning can take a lot of time and leaves a lot up to chance, as it's likely that the AI won't make consistent choices. This means it likely isn't that suitable for video games. In a game environment, you want your enemy AI to seem intelligent and unpredictable to the player, but in reality, you want to be able to completely control your AI's actions. In theory, the AI could be used to adapt to a player's actions to better counter them, but that would take quite a lot of trial and error. In a game environment, it's probably best to make sure your AI will react how you want it to react.

Sources

Comi, Mauro. “How to Teach an AI to Play Games: Deep Reinforcement Learning.” Medium, Towards Data Science, 22 Mar. 2020, towardsdatascience.com/how-to-teach-an-ai-to-play-games-deep-reinforcement-learning-28f9b920440a.

Lee, Jason. “Teaching a Computer How to Play Snake with Q-Learning.” Medium, Towards Data Science, 23 July 2020, towardsdatascience.com/teaching-a-computer-how-to-play-snake-with-q-learning-93d0a316ddc0.

Shyalika, Chathurangi. “A Beginners Guide to Q-Learning.” Medium, Towards Data Science, 16 Nov. 2019, towardsdatascience.com/a-beginners-guide-to-q-learning-c3e2a30a653c.