Skip to content

Latest commit

 

History

History
104 lines (72 loc) · 2.43 KB

README.md

File metadata and controls

104 lines (72 loc) · 2.43 KB

Markov Decision Process

Problem Description


1

The maze environment is a 6x6 grid world which contains walls, rewards and penalties. All green squares have a reward of +1. All orange/red squares have a penalty of -1. All white squares have a reward of -0.04. All grey squares are walls and these are unreachable states. Hence the reward for walls need not be defined and has been set to 0 for this experiment.


1

The transition model is as follows: the intended outcome occurs with probability 0.8, and with probability 0.1 the agent moves at either right angle to the intended direction. If the move would make the agent walk into a wall, the agent stays in the same place as before. The rewards for the white squares are -0.04, for the green squares are +1, and for the brown squares are -1. Note that there are no terminal states; the agent’s state sequence is infinite.


Note

There is no terminal state. For some reinforcement learning algorithms, the epsiode ends after a certain number of steps.


Getting Started

Install Requirements

pip install -r requirements.txt

Run Code

python __main__.py --algorithm=value_iteration --display_policy=True --display_utilities=True

Choice of Algorithms

  • value_iteration
  • policy_iteration
  • sarsa
  • expected_sarsa
  • q_learning
  • monte_carlo
  • dyna_q

Results

Value Iteration

Policy

1

Utilities

1

Plot

1

Policy Iteration

Policy

1

Utilities

1

Plot

1