Skip to content

Example: build an Agent

Claudio Bonesana edited this page Apr 29, 2021 · 4 revisions

All agents extends a common class: the interface Agent. This class contains all the methods that the MatchManager class state machine will call during a game.

These methods that need to be implemented are the following four.

Choose the next Action

This method is used to choose the next action to perform. The method will receive the current Game Board and Game Status and the agent will have to generate an Actions.

Choice of the next action is basically what an Agent need to do. There it is possible to use simple algorithms, heuristics, or complex Deep Learning methods. The important thing is to return a valid Action.

As an example, the basic MLAgent, an abstract agent that allow the implementation of different kind of scoring functions based on Machine Learning, implements a very simple approach to find the next Action to perform.

First, it generates all the possible actions for all the figures:

all_actions = []

for figure in state.getFiguresCanBeActivated(self.team):
    actions = [self.gm.actionPassFigure(figure)] + \
               self.gm.buildAttacks(board, state, figure) + \
               self.gm.buildMovements(board, state, figure)
    all_actions += actions

Then checks if there are available actions. If not, it is useful to raise a ValueError exception: when raised, the GameManager will catch it and consider that the player cannot do anything and pass to the other team. This mechanism is particular useful with the responses or when a player has less figures, hence choices, than the other player.

if not all_actions:
    logger.warning('No actions available: no action given')
    raise ValueError('No action given')

Then finally the agent use its internal methods to check for the next best action to perform. Once again, if no actions is found the ValueError exception is raised.

# assign score to each available action
scores = self.scores(board, state, all_actions)
# find optimal action
bestScore, bestAction = self.bestAction(scores)

if not bestAction:
    logger.warning('No best action found: no action given')
    raise ValueError('No action given')

Check the implementation of this method for the MLAgent as an example.

Choose the next Response

This method is practically the same as chooseAction(board, state), but it will be called for a Response.

As an example, in the MLAgent the method is implemented exactly as the chooseAction() but on a different set of "actions":

all_actions = []

for figure in state.getFiguresCanBeActivated(self.team):
    actions = [self.gm.actionPassResponse(self.team)] + \
               self.gm.buildResponses(board, state, figure)

    all_actions += actions

Check the implementation of this method for the MLAgent as an example and compare it with the chooseresponse() to compare the difference and similarities.

Initial placing of figures

Some scenarios (like Junction) permit to a player to choose where to place its initial figures. In this method the agent is allowed to freely move its units in a limited area. The changes applied to the state are kept and used as the game starts.

The first thing that the implementation of this method does, is to find the placement area and the figures from the current state:

x, y = np.where(state.placement_zone[self.team] > 0)
figures = state.getFigures(self.team)

It is best practice to perform a deepcopy of the state of the game and do all the computation needed to find the best initial position on the copy.

s = deepcopy(state)

When the positions are definitive, just use the state.moveFigure(figure, dst) method directly on the original state:

for j in range(len(figures)):
    figure = figures[j]
    dst = Hex(x[optimal_position[j]], y[optimal_position[j]]).cube()
    state.moveFigure(figure, dst=dst)

Check the implementation of this method for the GreedyAgent as an example.

Choice of initial figures to use

Some scenarios allow the choice between group of figures. These are fixed position of different "color" on a scenario that an agent can choose to use. The list of available colors is given directly by the state object:

colors = list(state.choices[self.team].keys())

And the choice are done by directly call the state.choose(team, color) method on the state object.

state.choose(self.team, color)

Check the implementation of this method for the GreedyAgent as an example.

Best practices

Avoid state pollution

Avoid the polluting the GameState object by doing a deepcopy of it:

from utils.copy import deepcopy

new_state = deepcopy(state)

Use the internal GameManager

Use the internal GameManager utility class to test the result of an action:

s1, outcome = self.gm.activate(board, state, action)

To generate the actions for a figure

figurePass = self.gm.actionPassFigure(figure)
figureMove = self.gm.actionMove(board, state, figure, destination=Cube(1,2,3)
figureAttack = self.gm.actionAttack(board, state, figure, target, weapon)

To build the available actions:

movements = self.gm.buildMovements(board, state, figure)
attacks = self.gm.buildAttacks(board, state, figure)
responses = self.gm.buildResponses(board, state, figure)

Store an agent's history

When one wants to analyze the performance or the behavior of an agent it is useful to keep track and check the history of the actions. The Agent interface offers three methods to store and retrieve the history of actions done by an agent:

Usage is pretty straightforward. When an agent find the best action or response, use the register() method:

...
self.register(state, [bestAction])
...

Then to generate the Pandas' DataFrame of the history, use the createDataFrame() method:

df = red_agent.createDataFrame()

The basic implementation is very... limited. For this reason many agent implementation build a store() method that wraps the register() method.

As an example, we can check the implementation of the AlphaBeta agent:

def store(self, state: GameState, bestScore: float, bestAction: Action) -> None:
    data = [bestScore, type(bestAction).__name__, self.maxDepth]
    self.register(state, data)

Instead of use the raw data list, the wrapper takes as arguments some useful information (the score of the best action and the action itself) and build some extra values. To avoid issues with the generation fo the Pandas' DataFrame, the dataFrameInfo() method is also extended with the nemae of the additional columns:

def dataFrameInfo(self) -> List[str]:
    return super().dataFrameInfo() + ['score', 'action', 'depth']
Clone this wiki locally