RL-TicTacToe is a project mainly written in Python, it's free.
RL agents to play TicTacToe
Software Architecture:
RL-Glue mechanism:
"Environment" - Returns a 'state', set of valid actions and a reward.
"Agent" - Returns an action
Run on a common platform
A sample invocation would be
./main.py 100 "OptimalAgent" "TicTacToe:random:RandomAgent"
this starts the TicTacToe with the Agent being the OptimalAgent, and the opponent is a RandomAgent and is randomly chosen to start first;
Another sample invocation would be
./main.py 100 "PolicyGradient" "TicTacToe:false:OptimalAgent"
This does the same, with the PolicyGradient as the Agent, and the OptimalAgent as the opponent; though now the Agent always starts first.