Udacity Deep Reinforcement Learning Nanodegree
This project is part of the Udacity Deep Reinforcement Learning Nanodegree program.
The agent can apply four discrete actions to move through a 2D square plane:
move forward
, move backward
, turn left
, turn right
.
The agent can accumulate rewards (+1
) by collecting yellow bananas
but will be penalized (-1
) for collecting blue bananas.
Each frame is observed in the form of a 37-dimensional state vector encoding the agent's velocity and ray-based perception information.
The agent learns from experience through repeated interaction with the Unity simulation environment.
The environment is considered solved when the agent accumulates an average reward of +13 per episode.
Using a Deep Q-Network with hidden layer sizes [74,37,16,8] to approximate the action-value function, the episode can be solved in about 300 episodes.
TODO: virtualenv installation
Download Unity simulator for Linux here or directly through the command line
wget https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip # with visualization
wget https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux_NoVis.zip # no visualization
and unzip them in the root directory of this repository (simulator files for MacOS and Windows ).
Run the complete test suite with the command
python -m unittest qlearning tests