Version 0.9.6: Notebooks / Model Performance Evaluation (#10)

* Added: - Interpreter (Cleaner) with cleaner code / closer to fastai - To and From Pickle - to_csv Notes: - I tried doing a from_csv implementation, however I am seeing that something like this might not be possible unless using system stuff. Not sure when I will ever get to this. I have some ideas about saving images / states as files with file paths... Maybe to_csv generates a file system also? * Added: - Group Interpreter for combining model runs - Initial fixed dqn notebook (soft of) Fixed: - recorder callback ordering - renaming. It seems that fasti has some cool in-notebook test widgets that we might want to use in the future * Added: - Group Interpreter merging - DQN base notebook - Interpreters with by default close envs Fixed: - env closing <- might be a continuous issue due to different physics engines * Fixed: - setup.py fastai needs to be min 1.0.59 * Fixed: - cpu / device issues. * Added: - DQN Group Results - Reward Metric Notes: - I am realizing that we need sum reward smoothing. The graphs are way too messy. * Added: - Analysis property to the group interpretation * Fixed: - PER crashing due to containing 0 items * Added: - Group Interpretation value smoothing * Fixed: - Value smoothing making the reward values way too big - Tests take too long. If Image input, just do a shorter fit cycle - PER batch size not updating - Tests take too long. If Image input, just do a shorter fit cycle - cuda issues - Bounds n_possible_values is only calculated when used. Should make iteration faster. Added: - Smoothing for the scalar plotting * More test fixing * Fixed: - cuda issues * Added: - Lunary Lander performance test * Added: - minigrid compat - normalization module for dqns using Bounds object * Fixed: - Normalizing cuda error * Fixed: - DDPG cuda error * Fixed: - pybullet human rendering. Pybullet renders differently from regular openai envs. Basically if you want to see what is happening, the ender method needs to be executed prior to reset. Added: - DDPG testing - ddpg env runs - more results - more ddpg tests - walker2d data * Fixed: - Possibly pybullet envs from crashing. There was an issue where the pybullet wrapper was not being added :( * Version 0.9.5 mass refactor (#12) * Added: - Refactored DQN code - DQN learner basic Fixed: - DQN model crashing * Added: - All DQNs pass tests * Fixed: - Some dqn / gym_maze / embedding related crashes - DQN test code and actual DQN tests * Added: - Maze heap map interpreter - Started q value interpreter * Fixed: - DDPG GPU issue. Sampling / action and state objects support to device calls. - DQN GPU issue. - azure pipeline test * Updated: - jupyter notebooks * Removed: - old code files * Fixed: - metrics, ddpg tests * Added: - basic q value plotting - basic q value plotting for ddpg * Updated Version * Changed: - Setup.py excludes some third arty packages due to pypi restriction. Need to find a way around this. * Removed: - old code from README. Revisions coming. * Added: - batch norm toggling. For now / forever defaulted to false * Version 0 9 5 mass refactor (#13) * Added: - revised test script - Slowly adding tests. * Fixed: - somehow trained_learner method in test was completely broken * Added: - Interpreter edge control. can also show average line * Fixed: - models being all shitty. Apparently, batch norm reaaally screws them up. If you use batch norm, the batch size needs to be massive (128 wasnt large enough). By default, you can kind of turn off batch_norm in the Tabular models, but they still, when given a continuous input, will have an entry batch norm. I over-wrote it and now they work significantly better :) * Updated: - gitignore
josiahls · Dec 22, 2019 · 0abb10a · 0abb10a
1 parent 6364d54
commit 0abb10a
Show file tree

Hide file tree

Showing 83 changed files with 4,726 additions and 2,385 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,7 +6,12 @@ gen
 .gitignore
 
 # Jupyter Notebook
-/fast_rl/notebooks/.ipynb_checkpoints/
+*/.ipynb_checkpoints/*
 
 # Data Files
-/docs_src/data/*
+#/docs_src/data/*
+
+# Build Files / Directories
+build/*
+dist/*
+fast_rl.egg-info/*
diff --git a/README.md b/README.md
@@ -20,9 +20,6 @@ However there are also frameworks in PyTorch most notably Facebook's Horizon:
 - [Horizon](https://github.com/facebookresearch/Horizon)
 - [DeepRL](https://github.com/ShangtongZhang/DeepRL)
 
-Our motivation is that existing frameworks commonly use tensorflow, which nothing against tensorflow, but we have 
-accomplished more in shorter periods of time using PyTorch. 
-
 Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. 
 The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing
 new agents. 
@@ -72,141 +69,28 @@ working at their best. Post 1.0.0 will be more formal feature development with C
 **Critical**
 Testable code:
 ```python
-from fast_rl.agents.DQN import DQN
-from fast_rl.core.basic_train import AgentLearner
-from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
-
-data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
-model = DQN(data)
-learn = AgentLearner(data, model)
-learn.fit(450)
-``` 
-Result:
-
-| ![](res/pre_interpretation_maze_dqn.gif) |
-|:---:|
-| *Fig 1: We are now able to train an agent using some Fastai API* |
-
-
-We believe that the agent explodes after the first episode. Not to worry! We will make a RL interpreter to see whats 
-going on!
-
-- [X] 0.2.0 AgentInterpretation: First method will be heatmapping the image / state space of the 
-environment with the expected rewards for super important debugging. In the code above, we are testing with a maze for a
-good reason. Heatmapping rewards over a maze is pretty easy as opposed to other environments.
-
-Usage example:
-```python
-from fast_rl.agents.DQN import DQN
-from fast_rl.core.Interpreter import AgentInterpretationAlpha
-from fast_rl.core.basic_train import AgentLearner
-from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
-
-data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
-model = DQN(data)
-learn = AgentLearner(data, model)
-learn.fit(10)
-
-# Note that the Interpretation is broken, will be fixed with documentation in 0.9
-interp = AgentInterpretationAlpha(learn)
-interp.plot_heatmapped_episode(5)
-```
-
-| ![](res/heat_map_1.png) |
-|:---:|
-| *Fig 2: Cumulative rewards calculated over states during episode 0* |
-| ![](res/heat_map_2.png) |
-| *Fig 3: Episode 7* |
-| ![](res/heat_map_3.png) |
-| *Fig 4: Unimportant parts are excluded via reward penalization* |
-| ![](res/heat_map_4.png) |
-| *Fig 5: Finally, state space is fully explored, and the highest rewards are near the goal state* |
-
-If we change:
-```python
-interp = AgentInterpretationAlpha(learn)
-interp.plot_heatmapped_episode(epoch)
-```
-to:
-```python
-interp = AgentInterpretationAlpha(learn)
-interp.plot_episode(epoch)
-```
-We can get the following plots for specific episodes:
-
-| ![](res/reward_plot_1.png) |
-|:----:|
-| *Fig 6: Rewards estimated by the agent during episode 0* |
-
-As determined by our AgentInterpretation object, we need to either debug or improve our agent. 
-We will do this in parallel with creating our Learner fit function. 
-
-- [X] 0.3.0 Add DQNs: DQN, Dueling DQN, Double DQN, Fixed Target DQN, DDDQN.
-- [X] 0.4.0 Learner Basic: We need to convert this into a suitable object. Will be similar to the basic fasai learner
-hopefully. Possibly as add prioritize replay?
-    - Added PER.
-- [X] 0.5.0 DDPG Agent: We need to have at least one agent able to perform continuous environment execution. As a note, we 
-could give discrete agents the ability to operate in a continuous domain via binning. 
-    - [X] 0.5.0 DDPG added. let us move
-    - [X] 0.5.0 The DDPG paper contains a visualization for Q learning might prove useful. Add to interpreter.
-
-| ![](res/ddpg_balancing.gif) |
-|:----:|
-| *Fig 7: DDPG trains stably now..* |
-
-
-Added q value interpretation per explanation by Lillicrap et al., 2016. Currently both models (DQN and DDPG) have 
-unstable q value approximations. Below is an example from DQN.
-```python
-interp = AgentInterpretationAlpha(learn, ds_type=DatasetType.Train)
-interp.plot_q_density(epoch)
-```
-Can be referenced in `fast_rl/tests/test_interpretation` for usage. A good agent will have mostly a diagonal line, 
-a failing one will look globular or horizontal.
-
-| ![](res/dqn_q_estimate_1.jpg) |
-|:----:|
-| *Fig 8: Initial Q Value Estimate. Seems globular which is expected for an initial model.* |
-
-| ![](res/dqn_q_estimate_2.jpg) |
-|:----:|
-| *Fig 9: Seems like the DQN is not learning...* |
-
-| ![](res/dqn_q_estimate_3.jpg) |
-|:----:|
-| *Fig 10: Alarming later epoch results. It seems that the DQN converges to predicting a single Q value.* |
-
-- [X] 0.6.0 Single Global fit function like Fastai's. Think about the missing batch step. Noted some of the changes to 
-the existing the Fastai 
-
-| ![](res/fit_func_out.jpg) |
-|:----:|
-| *Fig 11: Resulting output of a typical fit function using ref code below.* |
-
-```python
-from fast_rl.agents.DQN import DuelingDQN
-from fast_rl.core.Learner import AgentLearner
-from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
-
-
-data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human', max_steps=1000)
-model = DuelingDQN(data)
-# model = DQN(data)
-learn = AgentLearner(data, model)
-
-learn.fit(5)
+from fast_rl.agents.dqn import *
+from fast_rl.agents.dqn_models import *
+from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
+from fast_rl.core.data_block import MDPDataBunch
+from fast_rl.core.metrics import *
+
+data = MDPDataBunch.from_env('CartPole-v1', render='rgb_array', bs=32, add_valid=False)
+model = create_dqn_model(data, FixedTargetDQNModule, opt=torch.optim.RMSprop, lr=0.00025)
+memory = ExperienceReplay(memory_size=1000, reduce_ram=True)
+exploration_method = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
+learner = dqn_learner(data=data, model=model, memory=memory, exploration_method=exploration_method)
+learner.fit(10)
 ```
 
-reset commit
-
 - [X] 0.7.0 Full test suite using multi-processing. Connect to CI.
 - [X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
-- [ ] **Working on** 0.9.0 Notebook demonstrations of basic model usage.
-- [ ] **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At 
+- [X] 0.9.0 Notebook demonstrations of basic model usage.
+- [ ] **Working on** **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At 
 this point, all models should have guaranteed environments they should succeed in. 
-- [ ] 1.2.0 Add PyBullet Fetch Environments
-    - [ ] 1.2.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
-    - [ ] 1.2.0 Add HER
+- [ ] 1.8.0 Add PyBullet Fetch Environments
+    - [ ] 1.8.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
+    - [ ] 1.8.0 Add HER
 
 
 ## Code 

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -3,43 +3,42 @@
 # Add steps that build, run tests, deploy, and more:
 # https://aka.ms/yaml
 
+#  - bash: "sudo apt-get install -y xvfb freeglut3-dev python-opengl --fix-missing"
+#    displayName: 'Install ffmpeg, freeglut3-dev, and xvfb'
+
 trigger:
 - master
 
-pool:
-  vmImage: 'ubuntu-18.04'
-
-steps:
-
-#- bash: "sudo apt-get install -y ffmpeg xvfb freeglut3-dev python-opengl"
-#  displayName: 'Install ffmpeg, freeglut3-dev, and xvfb'
-
-- task: UsePythonVersion@0
-  inputs:
-    versionSpec: '3.7'
-
-# - script: sh ./build/azure_pipeline_helper.sh
-#   displayName: 'Complex Installs'
-
-- script: |
-    # pip install Bottleneck
-    # python setup.py install
-    pip install pytest
-    pip install pytest-cov
-  displayName: 'Install Python Packages'
-
-- script: |
-    xvfb-run -s "-screen 0 1400x900x24" pytest tests --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
-  displayName: 'Test with pytest'
-
-- task: PublishTestResults@2
-  condition: succeededOrFailed()
-  inputs:
-    testResultsFiles: '**/test-*.xml'
-    testRunTitle: 'Publish test results for Python $(python.version)'
-
-- task: PublishCodeCoverageResults@1
-  inputs:
-    codeCoverageTool: Cobertura
-    summaryFileLocation: '$(System.DefaultWorkingDirectory)/**/coverage.xml'
-    reportDirectory: '$(System.DefaultWorkingDirectory)/**/htmlcov'
+jobs:
+- job: 'Test'
+  pool:
+    vmImage: 'ubuntu-16.04' # other options: 'macOS-10.13', 'vs2017-win2016'
+  strategy:
+    matrix:
+      Python36:
+        python.version: '3.6'
+  steps:
+  - task: UsePythonVersion@0
+    inputs:
+      versionSpec: '$(python.version)'
+
+  - bash: "sudo apt-get install -y freeglut3-dev python-opengl"
+    displayName: 'Install freeglut3-dev'
+
+  - script: |
+      python -m pip install --upgrade pip setuptools wheel pytest pytest-cov -e .
+      python setup.py install
+    displayName: 'Install dependencies'
+
+  - script: sh ./build/azure_pipeline_helper.sh
+    displayName: 'Complex Installs'
+
+  - script: |
+     xvfb-run -s "-screen 0 1400x900x24" py.test tests --cov fast_rl --cov-report html --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
+    displayName: 'Test with pytest'
+
+  - task: PublishTestResults@2
+    condition: succeededOrFailed()
+    inputs:
+      testResultsFiles: '**/test-*.xml'
+      testRunTitle: 'Publish test results for Python $(python.version)'
diff --git a/build/azure_pipeline_helper.sh b/build/azure_pipeline_helper.sh
@@ -1,14 +1,14 @@
 #!/usr/bin/env bash
 
-# Install pybullet
-git clone https://github.com/benelot/pybullet-gym.git
-cd pybullet-gym
-pip install -e .
-cd ../
+## Install pybullet
+#git clone https://github.com/benelot/pybullet-gym.git
+#cd pybullet-gym
+#pip install -e .
+#cd ../
 
-# Install gym_maze
-git clone https://github.com/MattChanTK/gym-maze.git
-cd gym-maze
-python setup.py install
-cd ../
+## Install gym_maze
+#git clone https://github.com/MattChanTK/gym-maze.git
+#cd gym-maze
+#python setup.py install
+#cd ../
 
diff --git a/docs_src/data/cartpole_dddqn/dddqn_er_rms.pickle b/docs_src/data/cartpole_dddqn/dddqn_er_rms.pickle
diff --git a/docs_src/data/cartpole_dddqn/dddqn_per_rms.pickle b/docs_src/data/cartpole_dddqn/dddqn_per_rms.pickle
diff --git a/docs_src/data/cartpole_ddqn/ddqn_er_rms.pickle b/docs_src/data/cartpole_ddqn/ddqn_er_rms.pickle
diff --git a/docs_src/data/cartpole_ddqn/ddqn_per_rms.pickle b/docs_src/data/cartpole_ddqn/ddqn_per_rms.pickle
diff --git a/docs_src/data/cartpole_dqn fixed targeting/dqn fixed targeting_er_rms.pickle b/docs_src/data/cartpole_dqn fixed targeting/dqn fixed targeting_er_rms.pickle
diff --git a/docs_src/data/cartpole_dqn fixed targeting/dqn fixed targeting_per_rms.pickle b/docs_src/data/cartpole_dqn fixed targeting/dqn fixed targeting_per_rms.pickle
diff --git a/docs_src/data/cartpole_dqn/dqn_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/cartpole_dqn/dqn_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/cartpole_dqn/dqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/cartpole_dqn/dqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/cartpole_dueling dqn/dueling dqn_er_rms.pickle b/docs_src/data/cartpole_dueling dqn/dueling dqn_er_rms.pickle
diff --git a/docs_src/data/cartpole_dueling dqn/dueling dqn_per_rms.pickle b/docs_src/data/cartpole_dueling dqn/dueling dqn_per_rms.pickle
diff --git a/docs_src/data/halfcheetah_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/halfcheetah_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/halfcheetah_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/halfcheetah_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_dddqn/dddqn_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_dddqn/dddqn_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_dddqn/dddqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_dddqn/dddqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_ddqn/ddqn_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_ddqn/ddqn_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_ddqn/ddqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_ddqn/ddqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/...narlander_dqn fixed targeting/dqn fixed targeting_ExperienceReplay_FEED_TYPE_STATE.pickle b/...narlander_dqn fixed targeting/dqn fixed targeting_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/...r_dqn fixed targeting/dqn fixed targeting_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/...r_dqn fixed targeting/dqn fixed targeting_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_dqn/dqn_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_dqn/dqn_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_dqn/dqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_dqn/dqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/lunarlander_dueling dqn/dueling dqn_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/lunarlander_dueling dqn/dueling dqn_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/.../data/lunarlander_dueling dqn/dueling dqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/.../data/lunarlander_dueling dqn/dueling dqn_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/mountaincarcontinuous_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/mountaincarcontinuous_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/..._src/data/mountaincarcontinuous_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/..._src/data/mountaincarcontinuous_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/mujocoreach_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/mujocoreach_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/mujocoreach_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/mujocoreach_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/pendulum_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/pendulum_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/pendulum_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/pendulum_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/walker2d_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/walker2d_ddpg/ddpg_ExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/data/walker2d_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle b/docs_src/data/walker2d_ddpg/ddpg_PriorityExperienceReplay_FEED_TYPE_STATE.pickle
diff --git a/docs_src/rl.agents.dddqn.ipynb b/docs_src/rl.agents.dddqn.ipynb