Building a Machine Learning-Based generals.io Bot - Part 1

Exploring neural network approaches to an online strategy game

January 20th, 2024 by Ian Palmer

The online game generals.io is a fairly straightforward online game in which players control armies on a grid-like board and compete to capture other players' generals. If you haven't played before, you can learn the game in a few minutes on their website. I first played generals.io a few years ago, but only recently learned that the game has an official bot API. Naturally, I set out to build the best bot possible using machine learning. What I originally thought would be a simple weekend project has instead taken several months and thousands of lines of code to this point. In this article, I'll cover the setup and goals of the project followed by an overview of my implementation. At the end, I'll survey some of the challenges that I currently face and look ahead to future work on the project.

Rules

The goal of generals.io is simple: capture all of your opponents' generals while preventing them from capturing your own. The game may be played with 2-12 players, and players may join teams freely before the start of the game. At the start of the game, each player owns one tile with one army and a general placed randomly on the board. Tiles containing a general generate one army every two game ticks. Players may order armies of size two or greater to move to an adjacent tile, and if that tile is empty then players gain control of that tile. All tiles owned by a player gain one army every 25 turns. There are a number of additional tile types, including impassable mountains and capturable cities that function similarly to generals and generate one army every two ticks once captured. Turns last 0.5 seconds by default, and games continue until one player has captured all of the generals on the board. To gain some more familiarity with the rules, I would recommend visiting generals.io and playing a few games.

First Steps

I set out to make a proof of concept at the start of the project. The most significant challenge in making a proof of concept was gaining familiarity with the API, which is socket-based. Fortunately, the developers created a tutorial, so all I had to do was convert that code to Python, create a command line interface, and create a few helper classes that I thought would make the eventual transition to a machine learning model easier. At this stage, my intent was to make the classes as modular as possible so I could just "plug and play" when testing future models. You can see some of those base classes, like Brain, that I would later extend to add more functionality.

I decided to start with a simple command-line interface to my bot. Given that the goal of this project is to train robots to ruthlessly defeat humans (but only in generals.io), I named my bot Robocop.

The GameManager class houses most of the housekeeping required to run a game, like using the server's updates to adjust the local representation of the game board. This class is only used during online play, and some of the functionality like adding multiple players to a single game was added later.

After a day or so of tinkering, I had a working setup. I could start a bot match from my CLI, join the match in my browser, and play a bot in a game of generals.io. At this stage, the bot could only move randomly, though. There was no chance that this version of the bot could be competitive against a human.

Naturally, the next step would be to add some ability for the bot to look ahead to the future and choose an optimal move instead of a random move. At this point, though, the magnitude of the project started to dawn on me. If I wanted to evaluate future moves to determine the best one, I needed the ability to simulate the result of a move. In the online version, however, that was all handled at the server, and all I got back was the result of one move. If I wanted to simulate multiple moves, I would essentially need to re-implement the game from scratch locally so I could have a realistic training environment. Only then could I even start training neural networks. Even simple bots that made any evaluation of a future board state would require a local way to process potential moves.

Training Environment

The Board class contains my implementation of the generals.io game logic that allows models to simulate the outcomes of potential moves. This class extends the API's native representation of a board with additional utilities that are useful for a machine learning application, but it retains the ability to express a game state in the native format so bots can interchangeably play online or local games.

The generals.io API represents a game state as a one-dimensional array of size (2 * height * width + 2) with a few special tile values to represent mountains, empty tiles, fog tiles, and fog obstacle tiles. Player armies are represented by positive integers and differentiated by an integer player ID. For a more in-depth look at how the API works, check out the official documentation. This representation poses a few challenges, the most significant of which is that it's not very conducive for a convolutional neural network-based bot to build useful features from. That is in part because of the arbitrary integer values assigned to mountains and empty tiles.

The Board class, on the other hand, represents a game state as a size (12 * height * width) array. My goal here was to design a representation that was both easy to use and allowed for a neural network to build useful features. Each layer has the following data:

There are a few key methods in the Board class:

initialize_board creates a new game from scratch. It is designed to imitate the way generals.io initializes a game, so it generates approximately the same number of mountains, cities, etc.
The Board class has a God's-eye view of the game state, so in order to prevent bots from cheating and utilizing information that they shouldn't have access to, view_board creates a view of the board that is limited by the player's fog boundary.
move contains all of the turn processing logic. Here, the game checks the validity of moves, resolves attacks, and updates the Board's internal representation.

Ultimately, the Board class is the backbone of all of the bots presented from here forward, as it allows bots to predict what the result of a move will be based on the rules of the game. It also provides a convenient representation for neural network-based bots to learn and test on.

Approach

Jumping straight to a neural network-based bot would be a big leap, so I wanted to see if I could instead make a better bot using some simple logic instead. During the game, the total number of tiles and armies owned by each player is shared knowledge between all of the players. This bot would use that information to choose moves that optimized the game state according to some equation. I called this MetricsBrain.

What information, exactly, should the bot use in its decision-making? In the Board class, I created a method that calculates a few pieces of information, guided by my own experience in playing the game. The basic information mentioned above is there, including the number of tiles owned by each player and the total number of armies each player is in control of. The method also returns the total number of neutral units, the number of cities owned by the player, and the number of units that the player has on their general square. I also return a score that the Board class computes based on that information, allowing me to add some hand-designed weights to each of those pieces of information.

For every possible valid move from the current game state (looking only one turn in the future), MetricsBrain receives those statistics and makes a "greedy" decision, choosing the move that produces the highest score (the hand-designed metric mentioned previously).

A fair amount of experimental trial-and-error went into designing that score. I started with something simple, like the number of tiles. This produces a bot that always claims new territory if it can, but doesn't generally attack cities or other players. Rewarding the bot for simply having more armies than the average of its opponents results in a bot that doesn't expand aggressively. Ultimately, I found that combining some of these scores together results in a reasonably well-balanced bot that will expand to new tiles while also engaging in some combat. You can see that code below.

At this point, I had a bot that could play generals.io and perform at a low level. Defeating it as a human was still trivial, though, as the bot didn't really play with any strategy or long-term thinking. That was to be expected, though. Next up was the challenge of incorporating machine learning into the bot logic.

Simple Models

The first machine learning-based model I created used a simple convolutional neural network (CNN). The model was comprised of two convolutional layers and two fully connected layers with a max pooling layer after the first convolution. Every intermediate layer used the ReLU activation function, and the output of the last layer was fed into a sigmoid activation function to get a value between 0 and 1. The output was chosen to represent a prediction of the bot's probability of winning, which would be the training objective.

Eventually the goal would be to train the bots using self-play, but I wanted to jumpstart the learning process using a more traditional training environment so the bots wouldn't have to learn entirely from scratch. Where would the training data come from, though? I wanted a set of generals.io game boards with an estimated probability of winning attached to the player's state. There is an official dataset of game replays available, but I elected to generate my own dataset using the models I had already created. To do this, I wrote the generate_dataset function. It repeatedly plays a MetricsBrain against a MetricsBrain locally and saves every game state along with the score returned by the Board class (discussed above). I figured that this score is a reasonable approximation for how well the bot is doing.

The initial epochs of training, then, look a lot like a traditional neural network training routine. I was really just training the CNN to predict a hand-designed score from the game state and make greedy moves based on that score. While that produced a bot that could do better than RandomBrain, it wasn't much better, performing about as well as MetricsBrain. That was enough, though, as all I wanted out of this stage of training was a bot that could feasibly win a game. That would be a good starting point for training via-self play, which was the next step.

The motivation to train using self-play was to mirror human learning as much as possible and allow the bots to develop their own strategies outside of any human-enforced strategy. The method begins by loading the starting checkpoint twice and creating two identical bot players.

These bots play against each other in several local matches, and the results of each game, every board state, and both bots' predictions of their chances of winning at each turn are saved. Because the same bot is playing itself, I can gather training data from both players.

At the end of the match, I use a reinforcement learning-style approach using the Bellman Equation (discussed in depth in part 2) to create a training label using the previous estimation of the state's value and the result of the game. Basically, if the bot ends up winning the game, I train the bot to value the states that it saw higher than it did before, and vice-versa if the bot loses the game. In doing so, the hope is that the bot learns better convolutional filters that enable it to better judge game states. I train on both the winning and losing results from the same game, which effectively doubles the data available.

Initial Results

So far, we've seen my basic implementation of a generals.io bot that can play online, some basic logic-based models, a CNN-based model, and finally a self-play training regime that allows the model to learn from playing itself. So how did these models perform compared to MetricsBrain or the CNN-based models before self-play?

At the time when I initially finished training my CNN-based models, the official generals.io bot client was down, and it didn't come back online for at least two months. Bummer. However, I could still evaluate the bot's performance locally. I found that the CNN-based models performed slightly worse than MetricsBrain before self-play training, but after 20 episodes of 10 games apiece the CNN-based models were outperforming MetricsBrain on average. The performance of the CNN against MetricsBrain was actually a good validation metric, as the performance of MetricsBot was relatively consistent.

There was still room to improve, however. The bots weren't playing with any sort of long-term strategy, which hurt their performance in generals.io. Part 2 of this article explores my approach to that problem as well as how they performed. You can read that article here.

About

In this article, we take a look at the process of creating neural network-based players for the game generals.io. This is part 1 of 2 (next).

Elsewhere

GitHub