Tensorflow code for our WACV 2019 paper
This code is based on TensorFlow. Please install it by following the instructions mentioned in TF website. moviepy is a pre-requisite for storing GIFs.
main.py is the main code containing the implementation of the architecture described in this paper.
game_Env.py is the code for the new customizable 2D environment introduced in this paper.
objects.json contains the specification of number and types of objects/obstacles.
objects_new.json contains new increased number of objects.
generateSentence.py - generates the feasible sentences for a given episode based on the environment configuration.
vocab.txt - lists the possible unique words present in instructions.
vocab_new.txt - increased vocabulary with new objects and new words present in instructions.
gifs directory contains some GIFs that were saved when we trained our attention architecture with n=5.
images directory contains the images used to represent agent, different objects and obstacles.
Our implementation can be trained on a GPU. Please specify the GPU using CUDA_VISIBLE_DEVICES flag.
CUDA_VISIBLE_DEVICES="1" python main.py
Run generateAttentionGifs.py to generate multiple gifs corresponding to the evolution of game state as well as of different attention maps in an episode. Extract out any single frame from those gifs(we used Preview app in mac). Once the images have been extracted, edit the location of the images(original and the attention map) in the Mask_Map.py code and run it to get the final mask. A sample extracted image of original state(orig.png), attention map(1.png) and the corresponding mask(1_masked.png) have also been uploaded.
Natural language instruction is: "There are multiple green apple. Go to larger one."
First gif shows the agent's trajectory as it navigates to the large green apple. Second gif shows the egocentric view as observed by the agent at every step in its trajectory.
gifs directory contain additional GIFs that were saved when we trained our attention architecture with n=5.
To replicate the 3D results go this link https://github.com/rl-lang-grounding/DeepRL-Grounding
A3C implementation is based on open source implementation of Arthur Juliani