人工智能中的井字棋问题

2025 年 4 月 15 日 | 阅读 17 分钟

井字棋（Tic Tac Toe）是一个简单而有趣的游戏，使其成为研究人工智能在博弈论中能力的理想平台。这个游戏很简单，由两名玩家轮流在一个 3x3 的棋盘上放置自己的标记，目标是将自己的三个标记连成横、竖或斜线。尽管看起来简单，井字棋却引发了人工智能中的一些基本概念，如决策选择、策略制定和对手预测；这使得它成为所有想要投身人工智能、博弈论和算法开发领域的开发者和研究人员的一个很好的入门选择。

井字棋问题的本质在于找到一个玩家可以采用的最佳策略来获胜，或者至少不输。因为棋盘状态的数量是有限的——不超过 9!——所以人工智能系统完全有可能检查所有可能的结果，并在每一步选择最佳走法。这意味着人工智能必须像人类智能一样行事，根据棋盘的状态、对对手走法的预期以及策略的调整来做出有效决策。

代码

现在，为了更好地理解，我们将实现一个玩井字棋游戏的模型。

导入库

# Importing required modules
import numpy as np 
import pandas as pd 
import pprint,random

from scipy.ndimage.interpolation import shift
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Dense,Dropout
from keras.optimizers import SGD

创建一个井字棋游戏

首先，我们必须实现游戏本身，才能学习如何玩井字棋。Numpy 是 Python 中用于数值计算最有用的库之一。我们可以很容易地实现一个简单的游戏。在这个 Numpy 数组中，我们将游戏棋盘表示为一个 3x3 的 Numpy 数组。它使用一种特定的表示法：2 代表无人占用的格子，换句话说，代表一个空格子，0 代表“O”，1 代表“X”。默认情况下，数组用数字 2 填充。这意味着棋盘上所有位置最初都是空的。在这个 Numpy 数组中，玩家轮流选择坐标，放置他们各自的标记——0 或 1。基本上，每个玩家的获胜条件是在横、竖或斜线上有三个自己的标记。为简化实现，程序将始终使用标记 1（“X”），而对手将使用标记 0（“O”）。这种直接方法旨在专注于游戏机制，并为教会程序如何制定策略和有效游戏建立一个基础。

下面是一张图，让你熟悉 Numpy 井字棋棋盘

Tic-Tac-Toe Problem in Artificial Intelligence

class game_tic_tac_toe(object):
    def __init__(self):
        self.board=np.full((3,3),2)

    def toss(self):
        """ Function that simulates the tossing of a coin to decide who plays first.

        Args:

        Returns:
        Returns 1 if the player assigned mark 1 has won, or 0 if his opponent won

        """
        turn=np.random.randint(0,2,size=1)
        if turn.mean()==0:
            self.monitor_turn=0
        elif turn.mean()==1:
            self.monitor_turn=1
        return self.monitor_turn

    def move(self,player,coord):
        """ Function to perform the action of making a mark on the tic tac toe board.
After executing the action, this function toggles the value of the monitor_turn to
the next player


        Args:
        player: 1 if the player who is assigned the mark 1 is performing the action, 
        0 if his opponent is performing the action
        coord: The coordinate where the 1 or 0 is to be placed on the 
        tic-tac-toe board (numpy array)

        Returns:
        status_of_game(): Calls the game status function and returns its value
        board: Returns the new board state after making the move

        """
        if self.board[coord]!=2 or self.status_of_game()!="In Progress" or self.monitor_turn!=player:
            raise ValueError("Invalid move")
        self.board[coord]=player
        self.monitor_turn=1-player
        return self.status_of_game(),self.board


    def status_of_game(self):
        """A function to check the current status of the game.
Whether the game is won, drawn, or in progress.


        Args:

        Returns:
        "Won" if the game has been won, "Drawn" if the 
        the game has been drawn, or "In Progress", if the game is still in progress

        """
        #checking for a win along rows
        for i in range(self.board.shape[0]):
            if 2 not in self.board[i,:] and len(set(self.board[i,:]))==1:
                return "Won"
        #checknig for a win along columns
        for j in range(self.board.shape[1]):
            if 2 not in self.board[:,j] and len(set(self.board[:,j]))==1:
                return "Won"
        # check for a win along diagonals
        if 2 not in np.diag(self.board) and len(set(np.diag(self.board)))==1:
            return "Won"
        if 2 not in np.diag(np.fliplr(self.board)) and len(set(np.diag(np.fliplr(self.board))))==1:
            return "Won"
        # checking for a Draw
        if not 2 in self.board:
            return "Drawn"
        else:
            return "In Progress"

测试井字棋游戏

现在我们将进行一些虚拟对局来测试上面创建的井字棋游戏。我们需要为每个玩家走一步。

# create an object of the class tick_tac_toe_game
game=game_tic_tac_toe()
#we are  tossing to decide which player goes first
game.toss()
print("Player ",game.monitor_turn," has won the toss")
# make the first move
print("Initial board state \n",game.board)
print("Let first player place their mark on 0,0")
status_of_game,board=game.move(game.monitor_turn,(0,0))
print("New Board State: \n",board)
print("Let second player place their mark on 0,1")
status_of_game,board=game.move(game.monitor_turn,(0,1))
print("New Board State: \n",board)
print("Let first player place their mark on the 1,1")
status_of_game,board=game.move(game.monitor_turn,(1,1))
print("New Board State: \n",board)
print("Let second player place their mark on 0,2")
status_of_game,board=game.move(game.monitor_turn,(0,2))
print("New Board State: \n",board)
print("Let first player place their mark on 2,2")
status_of_game,board=game.move(game.monitor_turn,(2,2))
print("New Board State: \n",board)
print("Player ",1-game.monitor_turn," Has ",status_of_game)

输出

生成合法走法

合法走法生成器是一个函数，它接收棋盘状态和玩家作为输入，返回所有可能的合法走法的集合。

更直观地说，给定棋盘状态和一名玩家，它会告诉我们玩家可以在哪里放置他的标记——也就是那些没有对手标记的地方。

def legal_moves_generator(state_of_current_board,monitor_turn):
    """ Function that returns the set of all possible legal moves and resulting board states.
for a particular input of a board state and player

    Args:
    state_of_current_board: The current board state
    monitor_turn: 1 if it's the player who places the mark 1's turn to play, 0 if it's his opponent's turn

    Returns:
    dictionary_of_column_winning_move_check: A dictionary of a list of possible next coordinate-resulting board state pairs
    The resulting board state is flattened to 1 d array

    """
    dictionary_of_column_winning_move_check={}
    for i in range(state_of_current_board.shape[0]):
        for j in range(state_of_current_board.shape[1]):
            if state_of_current_board[i,j]==2:
                board_state_copy=state_of_current_board.copy()
                board_state_copy[i,j]=monitor_turn
                dictionary_of_column_winning_move_check[(i,j)]=board_state_copy.flatten()
    return dictionary_of_column_winning_move_check

测试合法走法

现在让我们来测试上面创建的合法走法生成器。

我们开始一个虚拟游戏并掷硬币决定先后手。然后，我们将当前的棋盘状态和掷硬币的获胜者传递给合法走法生成器。它会返回一个合法走法的字典。这个字典的格式是“可能的下一个合法走法坐标”：“扁平化后的结果棋盘状态”。

game=game_tic_tac_toe()
game.toss()
print("Player ",game.monitor_turn," has won the toss")
print("Current board state \n",game.board)
dictionary_of_column_winning_move_check=legal_moves_generator(game.board,game.monitor_turn)
print("Dictionary of Possible Next Legal Moves:")
pprint.pprint(dictionary_of_column_winning_move_check)

输出

我们可以注意到，上面合法走法生成器的返回值是一个字典，包含了所有可能的合法坐标及其对应的扁平化后的 numpy 棋盘状态。

神经网络模型

这是将要完成这项工作的模型，其中的评估器函数将棋盘状态映射到分数。

model = Sequential()
model.add(Dense(18, input_dim=9,kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(9, kernel_initializer='normal',activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1,kernel_initializer='normal'))

learning_rate = 0.001
momentum = 0.8

sgd = SGD(lr=learning_rate, momentum=momentum,nesterov=False)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.summary()

输出

程序走法选择器

程序走法选择器在给定当前棋盘状态的情况下，以非常有条理的方式为玩家选择下一步。它首先依赖合法走法生成器来生成所有可从当前配置派生出的合法棋盘状态。生成了下一组可能的棋盘状态后，它会借助评估器为这些状态中的每一个打分。评分基于几个因素；综合起来，它们评估了每个棋盘状态的优劣。然后，评估器得分最高的棋盘状态被选为下一步。通过这种方式，可以保证所选的走法既是合法的，又是根据模型的评估标准具有策略性的。

def move_selector(model,state_of_current_board,monitor_turn):
    """ Function to choose the next move to make from a set of possible legal moves

    Args:
    model: The Evaluator function is used to evaluate each possible next board state
    monitor_turn: 1 if it's the player who places the mark 1's turn to play, 0 if it's his opponent's turn

    Returns:
    selected_move: The numpy array coordinates where the player should place their mark
    new_board_state: The flattened new board state resulting from performing above selected move
    score: The score that was assigned to the above selected_move by the Evaluator (model)

    """
    tracker={}
    dictionary_of_column_winning_move_check=legal_moves_generator(state_of_current_board,monitor_turn)
    for legal_move_coord in dictionary_of_column_winning_move_check:
        score=model.predict(dictionary_of_column_winning_move_check[legal_move_coord].reshape(1,9))
        tracker[legal_move_coord]=score
    selected_move=max(tracker, key=tracker.get)
    new_board_state=dictionary_of_column_winning_move_check[selected_move]
    score=tracker[selected_move]
    return selected_move,new_board_state,score

测试程序选择器

现在，让我们用程序走法选择器函数来模拟一场井字棋游戏，由它选择下一步。我们将经历游戏中的几个步骤，以展示该函数如何根据给定的算法评估棋盘状态来选择最优走法。

# new game
game=game_tic_tac_toe()
# toss
game.toss()
# choose the first move
print("Player assigned mark 1",game.monitor_turn," won the toss")
print("Initial board state:")
print(game.board)
selected_move,new_board_state,score=move_selector(model,game.board,game.monitor_turn)
print("Selected move: ",selected_move)
print("Resulting new board state: ",new_board_state)
print("Score assigned to above board state by Evaluator(model): ", score)

输出

目前，模型评估器还没有经过训练，因此这些权重是随机的，这使得上面的分数是一个随机数。

对手的走法

井字棋的问题在于它太简单了，这通常会使游戏变得重复。在玩了大约 10-15 局游戏后，我们注意到程序和它的对手会陷入一种重复走法的模式，因为它们都遵循类似的模型。这种缺乏变化是井字棋的一个显著局限性。然而，像跳棋这样的游戏，由于其额外的可变性和策略深度，会使学习过程更具挑战性。在这样的设置中，为了学习，一个程序会与一个编码的对手对战，该对手被设计来模拟不同级别的难度。这个对手会根据所选模式改变其行为。例如，在简单模式下，对手会采用非常简单的方法。在对手的回合，它会使用合法走法生成器生成棋盘上所有可能的合法走法。然后，它会随机选择其中一步，并一直这样玩到游戏结束。通过这个过程，程序可以在不经常遇到非常困难的策略性情况的情况下进行大量运行，并能够广泛地探索井字棋的状态空间。

然而，困难模式下的对手遵循一种更具策略性的方法。首先，它运行一个合法走法生成器，告诉它所有可能的合法走法。如果有任何一步可以通过完成一行、一列或一对角线而立即获胜，那么对手就会选择这一步。如果对手没有找到能赢的走法，它会寻找能够阻止程序在下一回合获胜的走法。这是一种试图切断程序进展的防守策略。此外，如果存在一个合法的走法，可以产生一个包含两个“0”且不包含“1”的行、列或对角线配置，对手会选择这一步。如果这些条件都不满足，它就会随机选择一步。这种进攻和防守走法的混合，造就了一个更现实、更可信的对手，程序必须与之对战，从而适应并改进其玩法。

为了有效地训练程序，它会与编码的对手在简单或困难模式下进行多次对战，模式在每局游戏开始时随机选择。这种在两种模式下训练的策略或机制，用于平衡探索和策略学习。在简单模式下，有空间探索棋盘状态和尝试不同的策略，阻力较小。在困难模式下，程序被置于一个必须深入思考并与更专业的对手对战的场景中。这为创造更复杂的战术奠定了基础。程序在每局游戏结束时更新其模型的权重。它会根据输赢逐渐学习和提高。这个迭代过程不仅会使程序在对抗不同对手和游戏情境时变得更强，还会把它变成一个更好的井字棋玩家。

def row_winning_move_check(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ Function to scan row-wise and find coordinates amongst the legal coordinates that will
Result in a winning state of the board.

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to a win for the opponent
    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        #check for a win along rows
        for i in range(state_of_current_board_copy.shape[0]):
            if 2 not in state_of_current_board_copy[i,:] and len(set(state_of_current_board_copy[i,:]))==1:
                selected_move=legal_move_coord
                return selected_move
            
def checking_column_winning_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ It will return a function returning the coordinate among the valid coordinates so that upon placing the mark in that coordinate, a winning board condition is met after a sequential scan of every column.


    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move

    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to a win for the opponent
    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        for j in range(state_of_current_board_copy.shape[1]):
                    if 2 not in state_of_current_board_copy[:,j] and len(set(state_of_current_board_copy[:,j]))==1:
                        selected_move=legal_move_coord
                        return selected_move

def check_diag1_winning_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ That means being able to look diagonally for the permissible coordinates among them that will return a winning state on the board.


    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move

    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to a win for the opponent

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        if 2 not in np.diag(state_of_current_board_copy) and len(set(np.diag(state_of_current_board_copy)))==1:
            selected_move=legal_move_coord
            return selected_move
            
def diag2_winning_move_check(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """A function scanning the second diagonal for a coordinate amongst the legal coordinates leading to a win.

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move

    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to a win for the opponent

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        if 2 not in np.diag(np.fliplr(state_of_current_board_copy)) and len(set(np.diag(np.fliplr(state_of_current_board_copy))))==1:
            selected_move=legal_move_coord
            return selected_move
            
#------------#

def check_row_block_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """The ability to scan rows in a row and find the permissible coordinates that will stop the program from winning

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will block 1 from winning

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        for i in range(state_of_current_board_copy.shape[0]):
            if 2 not in state_of_current_board_copy[i,:] and (state_of_current_board_copy[i,:]==1).sum()==2:
                if not (2 not in state_of_current_board[i,:] and (state_of_current_board[i,:]==1).sum()==2):
                    selected_move=legal_move_coord
                    return selected_move
            
def check_column_block_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ Function to scan each column and find the permissible coordinate among them that will keep 1 from winning
 
    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will block 1 from winning

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        
        for j in range(state_of_current_board_copy.shape[1]):
                    if 2 not in state_of_current_board_copy[:,j] and (state_of_current_board_copy[:,j]==1).sum()==2:
                        if not (2 not in state_of_current_board[:,j] and (state_of_current_board[:,j]==1).sum()==2):
                            selected_move=legal_move_coord
                            return selected_move

def check_check_diag1_block_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ Now, 1 would lose if someone read through diagonal 1 and could figure out the proper coordinate between them.

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will block 1 from winning

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn    
        if 2 not in np.diag(state_of_current_board_copy) and (np.diag(state_of_current_board_copy)==1).sum()==2:
                if not (2 not in np.diag(state_of_current_board) and (np.diag(state_of_current_board)==1).sum()==2):
                    selected_move=legal_move_coord
                    return selected_move
            
def check_diag2_block_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """The function to identify the point among the allowed coordinates that will cause a column to contain exclusively zeros by scanning second diagonally

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to two 0s being there (and no 1s)

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        if 2 not in np.diag(np.fliplr(state_of_current_board_copy)) and (np.diag(np.fliplr(state_of_current_board_copy))==1).sum()==2:
            if not (2 not in np.diag(np.fliplr(state_of_current_board)) and (np.diag(np.fliplr(state_of_current_board))==1).sum()==2):
                selected_move=legal_move_coord
                return selected_move

#---------------#
def check_row_second_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ It returns a function, which takes a row and scans it row-wise to find the lawful coordinates that result in a row with two zeros and no one.

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to two 0s being there (and no 1s)

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        
        for i in range(state_of_current_board_copy.shape[0]):
            if 1 not in state_of_current_board_copy[i,:] and (state_of_current_board_copy[i,:]==0).sum()==2:
                if not (1 not in state_of_current_board[i,:] and (state_of_current_board[i,:]==0).sum()==2):
                    selected_move=legal_move_coord
                    return selected_move
            
def check_column_second_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """The ability to only go through column by column in order to tell which valid coordinate will result in a column with two 0s and no 1s.

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to two 0s being there (and no 1s)

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        
        for j in range(state_of_current_board_copy.shape[1]):
                    if 1 not in state_of_current_board_copy[:,j] and (state_of_current_board_copy[:,j]==0).sum()==2:
                        if not (1 not in state_of_current_board[:,j] and (state_of_current_board[:,j]==0).sum()==2):
                            selected_move=legal_move_coord
                            return selected_move

def check_diag1_second_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    ""Diagonal scanning and search for the allowed coordinates that will return a column with two zeros and no one ones.

    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to two 0s being there (and no 1s)

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        if 1 not in np.diag(state_of_current_board_copy) and (np.diag(state_of_current_board_copy)==0).sum()==2:
            if not (1 not in np.diag(state_of_current_board) and (np.diag(state_of_current_board)==0).sum()==2):
                selected_move=legal_move_coord
                return selected_move
            
def check_diag2_second_move(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
    """ This function returns the allowed coordinates that read second diagonally and produce a column of two zeros with no one.


    Args:
    dictionary_of_column_winning_move_check: Dictionary of legal next moves
    monitor_turn: whose turn it is to move
    
    Returns:
    selected_move: The coordinates of the numpy array where the opponent places their mark

    """ 
    legal_move_coords =  list(dictionary_of_column_winning_move_check.keys())
    random.shuffle(legal_move_coords)
    for legal_move_coord in legal_move_coords:
        state_of_current_board_copy=state_of_current_board.copy()
        state_of_current_board_copy[legal_move_coord]=monitor_turn
        if 1 not in np.diag(np.fliplr(state_of_current_board_copy)) and (np.diag(np.fliplr(state_of_current_board_copy))==0).sum()==2:
            if not (1 not in np.diag(np.fliplr(state_of_current_board)) and (np.diag(np.fliplr(state_of_current_board))==0).sum()==2):
                selected_move=legal_move_coord
                return selected_move
    
def opponent_move_selector(state_of_current_board,monitor_turn,mode):
    "" It is a function that takes care of whether the move is legal or not.

    Args:
    state_of_current_board: Current board state
    monitor_turn: whose turn it is to move
    mode: whether hard or easy mode

    Returns:
    selected_move: The coordinates of the numpy array where placing the 0 will lead to two 0s being there (and no 1s)

    """ 
    dictionary_of_column_winning_move_check=legal_moves_generator(state_of_current_board,monitor_turn)
    
    checking_winning_move=[row_winning_move_check,checking_column_winning_move,check_diag1_winning_move,diag2_winning_move_check]
    checking_block_move=[check_row_block_move,check_column_block_move,check_check_diag1_block_move,check_diag2_block_move]
    checking_second_move=[check_row_second_move,check_column_second_move,check_diag1_second_move,check_diag2_second_move]

    if mode=="Hard":
        random.shuffle(checking_winning_move)
        random.shuffle(checking_block_move)
        random.shuffle(checking_second_move)        
        
        for fn in checking_winning_move:
            if fn(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
                return fn(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn)
            
        for fn in checking_block_move:
            if fn(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
                return fn(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn)
            
        for fn in checking_second_move:
            if fn(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn):
                return fn(state_of_current_board,dictionary_of_column_winning_move_check,monitor_turn)
            
        selected_move=random.choice(list(dictionary_of_column_winning_move_check.keys()))
        return selected_move
    
    elif mode=="Easy":
        dictionary_of_column_winning_move_check=legal_moves_generator(state_of_current_board,monitor_turn)
        selected_move=random.choice(list(dictionary_of_column_winning_move_check.keys()))
        return selected_move

训练程序

从本质上讲，训练程序意味着教会模型，即评估器，为对程序有利的棋盘状态给予高分，为不利的棋盘状态给予低分。程序将以自我对弈模式与一个编码的敌人对战。在学习开始时，评估器的权重被随机分配。这种随机性提供了一个基线，模型从零开始学习。它在一个名为“走法选择器”的函数中使用评估器来决定每局游戏的走法。程序和对手轮流走棋，直到游戏结束。这种重复的对弈创造了各种各样的情况，评估器需要学习如何评估。

游戏结束后，评估器进入训练阶段。最终棋盘状态的分数会根据游戏结果进行调整。如果程序获胜，则最终棋盘状态的分数为 1。如果输了，则分数为 -1。如果是平局，分数设置为 0。然后，这些分数被回溯，成为之前棋盘状态的分数。在这个回溯过程中，这一步的重要性不容忽视，因为它使评估器能够看到导致该结果的走法顺序，从而识别出导致胜利或失败的棋盘状态模式。

然后，使用修正后的分数和棋盘状态来更新评估器的权重。模型会优化权重，使评估器能更好地估计棋盘状态的评估值并预测结果。这个过程会重复数十万局游戏。通过这种方式，模型从大量不同的情况和策略中学习。模型将逐渐变得更擅长为预示胜利的棋盘状态返回更高的分数，为不太有希望的位置返回更低的分数。这种持续学习和调整的过程最终会提高程序的策略决策能力，并改善其整体游戏水平。

def train(model,mode,progress_of_print=False):
    """This function makes a random move in a game against an opponent to train a model-based evaluator. It updates the weights of the model after the game.
    
    Note that the model weights are updated using SGD with a batch size of 1

    Args:
    model: The Evaluator function being trained

    Returns:
    model: The model was updated using SGD
    y: The corrected scores

    """ 
    # start the game
    if progress_of_print==True:
        print("___________________________________________________________________")
        print("Starting a new game")
    game=game_tic_tac_toe()
    game.toss()
    list_of_scores=[]
    list_of_corrected_list_of_scores=[]
    list_of_new_board_states=[]
    
    while(1):
        if game.status_of_game()=="In Progress" and game.monitor_turn==1:
            # If it is the program's turn, use the Move Selector function to select the next move
            Selected_move,new
_board_state,score=move_selector(model,game.board,game.monitor_turn)
            list_of_scores.append(score[0][0])
            list_of_new_board_states.append(new_board_state)
            # Make the next move
            status_of_game,board=game.move(game.monitor_turn,selected_move)
            if progress_of_print==True:
                print("Program's Move")
                print(board)
                print("\n")
        elif game.status_of_game()=="In Progress" and game.monitor_turn==0:
            selected_move=opponent_move_selector(game.board,game.monitor_turn,mode=mode)
        
            # Make the next move
            status_of_game,board=game.move(game.monitor_turn,selected_move)
            if progress_of_print==True:
                print("Opponent's Move")
                print(board)
                print("\n")
        else:
            break

    
    # Correct the scores, assigning 1/0/-1 to the winning/drawn/losing final board state, 
    # and assigning the other previous board states the score of their next board state
    list_of_new_board_states=tuple(list_of_new_board_states)
    list_of_new_board_states=np.vstack(list_of_new_board_states)
    if status_of_game=="Won" and (1-game.monitor_turn)==1: 
        list_of_corrected_list_of_scores=shift(list_of_scores,-1,cval=1.0)
        result="Won"
    if status_of_game=="Won" and (1-game.monitor_turn)!=1:
        list_of_corrected_list_of_scores=shift(list_of_scores,-1,cval=-1.0)
        result="Lost"
    if status_of_game=="Drawn":
        list_of_corrected_list_of_scores=shift(list_of_scores,-1,cval=0.0)
        result="Drawn"
    if progress_of_print==True:
        print("Program has ",result)
        print("\n Correcting the Scores and Updating the model weights:")
        print("___________________________________________________________________\n")
        
    x=list_of_new_board_states
    y=list_of_corrected_list_of_scores
    
    def unison_shuffled_copies(a, b):
        assert len(a) == len(b)
        p = np.random.permutation(len(a))
        return a[p], b[p]
    
    # shuffle x and y in unison
    x,y=unison_shuffled_copies(x,y)
    x=x.reshape(-1,9) 
    
    # update the weights of the model, one record at a time
    model.fit(x,y,epochs=1,batch_size=1,verbose=0)
    return model,y,result

输出

现在我们将运行 200,000 局程序和对手之间的井字棋游戏。记录每局游戏的结果；绘制程序在固定游戏间隔内的胜/负/平分布图。

counter_for_game=1
graph_data=pd.DataFrame()

mode_list=["Easy", "Hard"]

while(counter_for_game<=200000):
    mode_selected=np.random.choice(mode_list, 1, p=[0.5,0.5])
    model,y,result=train(model,mode=mode_selected[0],progress_of_print=False)
    graph_data=graph_data.append({"counter_for_game":counter_for_game,"result":result},ignore_index=True)
    if counter_for_game % 10000 == 0:
        print("Game#: ",counter_for_game)
        print("Mode: ",mode_selected[0])
    counter_for_game+=1

输出

评估

现在我们将绘制胜、平、负次数与游戏局数的关系图

bins = np.arange(1, counter_for_game/10000) * 10000
graph_data['counter_for_game_bins'] = np.digitize(graph_data["counter_for_game"], bins, right=True)
counts = graph_data.groupby(['counter_for_game_bins', 'result']).counter_for_game.count().unstack()
ax=counts.plot(kind='bar', stacked=True,figsize=(17,5))
ax.set_xlabel("Count of Games in Bins of 10,000s")
ax.set_ylabel("Counts of Draws/Losses/Wins")
ax.set_title('Distribution of Results Vs Count of Games Played')

输出

从上图中可以注意到，随着程序与对手进行的游戏越来越多，其胜率增加，败率减少。

除了这张图所呈现的内容，我无法做出任何更具体的观察，因为训练过程是嘈杂的，每次运行此内核时，上图都会有所不同。

下一主题软件工程中的人工智能

人工智能中的井字棋问题

导入库

创建一个井字棋游戏

测试井字棋游戏

生成合法走法

测试合法走法

神经网络模型

程序走法选择器

测试程序选择器

对手的走法

训练程序

评估

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

人工智能

智能体

解决问题

对抗性搜索

知识表示

不确定知识 R.

AI 的子集

现实世界中的 AI

人工智能选择题

其他

相关教程

人工智能中的井字棋问题

导入库

创建一个井字棋游戏

测试井字棋游戏

生成合法走法

测试合法走法

神经网络模型

程序走法选择器

测试程序选择器

对手的走法

训练程序

评估

相关帖子

人工智能中的一致代价搜索

人工智能在证券市场的应用

印度的人工智能机构

人工智能路线图

人工智能 (AI) 中的合一

人工智能在物联网 (IoT) 中的应用

用于大数据分析的人工智能

人工智能中的条件规划

人工智能中的理性

人工智能中的决策理论

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器