Creating a Chess AI based on my Lichess game database. Project for ECE 5973 Artificial Neural Networks [and further work in progress]
1 | Introduction
The problem of chess has been one of the earliest tasks in machine learning. Researchers such as Alan Turing and Claude Shannon had both published papers on computers chess algorithms in the early days of computing.
In modern times, chess is still unsolved. In fact, it is very likely unsolvable. But computers are much better at solving the problem than humans. An interesting thing about modern chess computers is that they use machine learning models based on human intuition to perform astronomically better than humans.
2 | Problem
Computer chess engines, powered by neural networks and reinforcement learning, excel in game solutions. However, many suggest moves distinct from human choices due to their computational accuracy. This disparity can hinder human learning from such engines. An exemplar of human-like chess AI is Maia Chess. My project's objective is to use deep learning to mimic personal chess styles, emphasizing capturing human decision-making in rapid games.
3 | Data
I focused on personal game datasets to ensure unique results. The dataset (~2,000 games) is optimized for an AMD Ryzen CPU without GPU acceleration. It comprises 1,633 Bullet, 165 Blitz, and 244 Rapid games. Emphasizing Bullet games captures impulsive decisions, providing a comprehensive play style view. All games were split into train, validation, and test sets (75-15-15).
4 | Methods
4.0 Board Features:
8x8x12 map representation, where each 8x8 channel denotes a piece (12 total). A '1' indicates the presence of a piece and '0' its absence.
4.1 Move Features:
Moves were translated from the Universal Chess Interface to a numerical system, leading to 4096 potential classes.
4.2 Random Valid Move Model:
A baseline model, achieving 3.1-3.6% accuracy through random move selection.
5 | Base Convolutional Model:
The base convolutional model uses a convolutional neural network with a variable number of layers.
- Utilizes a convolutional neural network with variable layer counts.
- Each layer maintains consistent input-output channels with a 3x3 kernel and padding.
- Output is flattened and connected to two dense layers, culminating in a 4096-long channel.
- Model trained by comparing cross-entropy loss between output and move classifications.
The best model had losses of 0.9274 (training), 4.616 (validation), and 4.696 (test). This led to a move prediction accuracy of 14.24%.
6 | Residual Model
After hyperparameter adjustments, this model had a top move prediction accuracy of 10.21%. Though inferior to the convolutional model, it outperforms the random baseline. Its intricate structure, combined with a smaller dataset, poses training challenges.
7 | Summary
In summary, we can see that some features are being captured by training convolutional neural network models against this small and limited dataset and the move prediction accuracy is improving, which shows successful progress. However, there is still much to be improved upon.