Reinforcement learning for Las Vegas



During a department board games night, we were playing Las Vegas when a fellow player remarked that he wondered how an AI for the game would perform. Since I had been meaning to spend some time learning to implement neural network techniques, this seemed like a great opportunity. One of the first things that came to mind was a paper from the DeepMind team on using deep neural networks to implement a variant Q-learning. The gist behind classical Q-learning is maintaining a table with the expected utility of a particular action in a given state. This table is updated while the game is played based on the observed rewards.

The idea behind deep Q-learning is to use a neural network to replace the table which is traditionally used. One of the big advantages is that it’s possible to handle much larger state-action spaces using a neural network. The first step was to decide how to represent the game state. For anyone not familiar with Las Vegas, Yucata has a good overview of the rules and a mechanism for playing online. The short version is that players take turns rolling dice and placing them on differently numbered casinos in an attempt to get the highest cash reward.

I first built a simple class structure for the game in Python to represent all the game state and stubbed out a couple functions to implement the game logic. The next step was to decide how the state was going to fed into the network. In the original deep Q-learning paper, the authors used a convolutional neural network to feed in frames from gameplay video. I instead chose to explicit represent the state using the following values:

  • Number of players in the game
  • Current game round number
  • Cash currently held by each player
  • Number of dice currently on each casino
  • Money available at each casino
  • Number of dice of each value in the current roll

Explicitly representing the state also resulted in a different structure for the network itself. The input layer simply received a normalized vector of the state values above. The second fully-connected layer was simply half the size of the first layer. Both of the first two layers used a ReLU) activation function. The final output layer was also fully connected but with a linear activation function and a size of six to represent the choice of each possible die. After training the AI against 4 random opponents, the AI was able to win around 50% of games which I was pretty happy with given the inherent randomness of the game. However, the evaluation was by no means robust and something that definitely needs to be improved upon.

I later implemented hyperparameter optimization using the Hyperopt library. After much more training, I tried to optimize the reward values, discount factor), and other parameters specific to deep Q-learning. This led me to change the kernel initializer to LeCun uniform, the activation function of the first two layers to a sigmoid function, and the optimization algorithm from RMSprop to Adam. This was mostly for a bit more fun although it did seem to provide about a 20% performance improvement on some simple evaluations I tried.

Since this is just a fun side project, one of the next things on my agenda is to implement a UI using boardgame.io so I can get a sense of how the AI “feels.” All in all, this was a pretty fun project. The source code is available on GitHub for anyone who wants to play with it.