Version 0_8_0: Stable (#9)

* Init new branch * Added: * Action object: For validating action sizes / dims, bundling important info * State: For validating state sizes / dims, bundling important info * Bounds: For determining the dtypes of its parent object, and determining if the object is discrete * Initial mass refactor of MDPDataset * Max Step expectation * Fixed: * MDPDataset episode iteration Notes: * Plan to add a generic MDPStep list validation function. A few things to expect: - There should never be 2 "done" steps in a row - Right after a done step, the step counter should show 0. - There should never be Nones in the state values. - Need to check for bad copies * Added: * WrapperLossFunc for compatibility with the existing fastai fit function * native fastai fit function compatibility :)))))))) * Single Learner that subclasses fastai Learner Notes: * Next Commit will have the old code removed * Fixed: * Memory handler. For now keeping k top. Less confusing implementation. * Removed old code Notes: * Need to now revise the DQN's and DDPG's to be compatible. * Fixed: * DDPG compat * Other DQN compat Removed: * nn and cnn generic functions for now. They were just way too confusing :( . The DDPG works now also. * Interpreter test code. Notes: * Interpreter is broken for now * Redo Tests
josiahls · Oct 26, 2019 · bc412b7 · bc412b7
1 parent bebda8e
commit bc412b7
Show file tree

Hide file tree

Showing 24 changed files with 1,040 additions and 1,233 deletions.
diff --git a/README.md b/README.md
@@ -70,41 +70,10 @@ known environments. Prior to 1.0.0, new changes might break previous code versio
 working at their best. Post 1.0.0 will be more formal feature development with CI, unit testing, etc. 
 
 **Critical**
-- [X] 0.0.0 MDPDataBunch: Finished to the point of being useful. Please reference: `tests/test_Envs`
-Example:
-```python
-from fast_rl.core.Envs import Envs
-from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
-
-# At present will try to load OpenAI, box2d, pybullet, atari, maze.
-# note "get_all_latest_envs" has a key inclusion and exclusion so if you don't have some of these envs installed, 
-# you can avoid this here. Certain envs just flat out do not work / are unusual. You are welcome to see how to get them
-# working.
-for env in Envs.get_all_latest_envs():
-    max_steps = 50  # Limit the number of per episode iterations for now.
-    print(f'Testing {env}')
-    mdp_databunch = MDPDataBunch.from_env(env, max_steps=max_steps, num_workers=0)
-    if mdp_databunch is None:
-        print(f'Env {env} is probably Mujoco... Add imports if you want and try on your own. Don\'t like '
-              f'proprietary engines like this. If you have any issues, feel free to make a PR!')
-    else:
-        epochs = 1 # N episodes to run
-        for epoch in range(epochs):
-            for state in mdp_databunch.train_dl:
-                # Instead of random action, you would have your agent here
-                mdp_databunch.train_ds.actions = mdp_databunch.train_ds.get_random_action()
-
-            for state in mdp_databunch.valid_dl:
-                # Instead of random action, you would have your agent here and have exploration to 0
-                mdp_databunch.valid_ds.actions = mdp_databunch.valid_ds.get_random_action()
-```
-- [X] 0.1.0 DQN Agent: Reference `tests/test_Learner/test_basic_dqn_model_maze`. We use Learner callbacks for 
-handling the different fit behaviors. 
-
 Testable code:
 ```python
 from fast_rl.agents.DQN import DQN
-from fast_rl.core.Learner import AgentLearner
+from fast_rl.core.basic_train import AgentLearner
 from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
 
 data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
@@ -130,13 +99,15 @@ Usage example:
 ```python
 from fast_rl.agents.DQN import DQN
 from fast_rl.core.Interpreter import AgentInterpretationAlpha
-from fast_rl.core.Learner import AgentLearner
+from fast_rl.core.basic_train import AgentLearner
 from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
 
 data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
 model = DQN(data)
 learn = AgentLearner(data, model)
 learn.fit(10)
+
+# Note that the Interpretation is broken, will be fixed with documentation in 0.9
 interp = AgentInterpretationAlpha(learn)
 interp.plot_heatmapped_episode(5)
 ```
@@ -229,8 +200,8 @@ learn.fit(5)
 reset commit
 
 - [X] 0.7.0 Full test suite using multi-processing. Connect to CI.
-- [ ] **Working On**  0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
-- [ ] 0.9.0 Notebook demonstrations of basic model usage.
+- [X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
+- [ ] **Working on** 0.9.0 Notebook demonstrations of basic model usage.
 - [ ] **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At 
 this point, all models should have guaranteed environments they should succeed in. 
 - [ ] 1.2.0 Add PyBullet Fetch Environments

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -30,7 +30,7 @@ steps:
   displayName: 'Install Python Packages'
 
 - script: |
-    xvfb-run -s "-screen 0 1400x900x24" pytest -n 8 fast_rl/tests --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
+    xvfb-run -s "-screen 0 1400x900x24" pytest -n 2 fast_rl/tests --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
   displayName: 'Test with pytest'
 
 - task: PublishTestResults@2

diff --git a/docs_src/rl.agents.dqnfixedtarget.ipynb b/docs_src/rl.agents.dqnfixedtarget.ipynb
@@ -15,10 +15,10 @@
     "import fast_rl.agents.DQN \n",
     "from fast_rl.agents.DQN import DQN, FixedTargetDQN, DoubleDQN, DuelingDQN, DoubleDuelingDQN\n",
     "from fast_rl.core.Interpreter import AgentInterpretationAlpha\n",
-    "from fast_rl.core.Learner import AgentLearner\n",
-    "from fast_rl.core.MarkovDecisionProcess import MDPDataBunch\n",
+    "from fast_rl.core.Learner import AgentLearnerAlpha\n",
+    "from fast_rl.core.MarkovDecisionProcess import MDPDataBunchAlpha\n",
     "from fast_rl.core.agent_core import PriorityExperienceReplay, ExperienceReplay\n",
-    "from fast_rl.core.MarkovDecisionProcess import MDPDataBunch, FEED_TYPE_IMAGE, FEED_TYPE_STATE\n",
+    "from fast_rl.core.MarkovDecisionProcess import MDPDataBunchAlpha, FEED_TYPE_IMAGE, FEED_TYPE_STATE\n",
     "from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon\n",
     "import sys\n",
     "import importlib"
@@ -423,12 +423,12 @@
     }
    ],
    "source": [
-    "data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human', max_steps=1000)\n",
+    "data = MDPDataBunchAlpha.from_env('maze-random-5x5-v0', render='human', max_steps=1000)\n",
     "model = FixedTargetDQN(data, batch_size=128, max_episodes=50, lr=0.001, copy_over_frequency=3,\n",
     "                       memory=ExperienceReplay(10000), discount=0.99, \n",
     "                      exploration_strategy=GreedyEpsilon(epsilon_start=1, epsilon_end=0.1,\n",
     "                                                         decay=0.001, do_exploration=True))\n",
-    "learn = AgentLearner(data, model)\n",
+    "learn = AgentLearnerAlpha(data, model)\n",
     "\n",
     "learn.fit(50)"
    ]

diff --git a/docs_src/rl.core.mdp_interpreter.ipynb b/docs_src/rl.core.mdp_interpreter.ipynb
@@ -79,14 +79,14 @@
     "import numpy as np\n",
     "\n",
     "from fast_rl.agents.DQN import DQN\n",
-    "from fast_rl.core.Learner import AgentLearner\n",
-    "from fast_rl.core.MarkovDecisionProcess import MDPDataBunch, MDPDataset\n",
+    "from fast_rl.core.Learner import AgentLearnerAlpha\n",
+    "from fast_rl.core.MarkovDecisionProcess import MDPDataBunchAlpha, MDPDatasetAlpha\n",
     "from fast_rl.core.Interpreter import AgentInterpretationAlpha\n",
     "%matplotlib inline\n",
     "    \n",
-    "data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64)\n",
+    "data = MDPDataBunchAlpha.from_env('CartPole-v1', render='human', bs=64)\n",
     "model = DQN(data)\n",
-    "learn = AgentLearner(data, model)\n",
+    "learn = AgentLearnerAlpha(data, model)\n",
     "\n",
     "learn.fit(5)"
    ]
@@ -106,9 +106,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "data = MDPDataBunch.from_pickle('CartPole-v1', render='human', bs=64)\n",
+    "data = MDPDataBunchAlpha.from_pickle('CartPole-v1', render='human', bs=64)\n",
     "model = DQN(data)\n",
-    "learn = AgentLearner(data, model)"
+    "learn = AgentLearnerAlpha(data, model)"
    ]
   },
   {

diff --git a/fast_rl/agents/BaseAgent.py b/fast_rl/agents/BaseAgent.py
@@ -1,15 +1,12 @@
 from math import floor
+from typing import Collection
 
-import gym
+import numpy as np
 import torch
-from fastai.basic_train import LearnerCallback, Any
-from fastai.callback import Callback
+from fastai.basic_train import LearnerCallback
 from fastai.layers import bn_drop_lin
 from gym.spaces import Discrete, Box
 from torch import nn
-from traitlets import List
-import numpy as np
-from typing import Collection
 
 from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
 from fast_rl.core.agent_core import ExplorationStrategy
@@ -29,6 +26,7 @@ def __init__(self, data: MDPDataBunch):
         self.loss = None
         self.out = None
         self.opt = None
+        self.warming_up = False
         self.learner_callbacks = []  # type: Collection[LearnerCallback]
         # Root model that will be accessed for action decisions
         self.action_model = None  # type: nn.Module
@@ -77,47 +75,14 @@ def forward(self, x):
         return x.view(x.size(0), -1)
 
 
-
-
-def create_nn_model(layer_list: list, action_size, state_size, use_bn=False, use_embed=False,
-                    activation_function=None, final_activation_function=None, action_val_to_dim=True):
-    """Generates an nn module.
-
-    Notes:
-        TabularModel could possibly be used along side a cnn learner instead. Will be a good idea to investigate.
-
-    Returns:
-
-    """
-    act = nn.LeakyReLU if activation_function is None else activation_function
-    # For now the dimension of the action does not make a difference.
-    action_size = action_size[0] if not action_val_to_dim else action_size[1]
-    # For now keep drop out as 0, test including dropout later
-    ps = [0] * len(layer_list)
-    sizes = [state_size] + layer_list + [action_size]
-    actns = [act() for _ in range(len(sizes) - 2)] + [None]
-    layers = []
-    for i, (n_in, n_out, dp, act) in enumerate(zip(sizes[:-1], sizes[1:], [0.] + ps, actns)):
-        if i == 0 and use_embed:
-            embedded, n_in = get_embedded(n_in[0], n_out, n_in[1], 5)
-            layers += [ToLong(), embedded, Flatten()]
-        elif i == 0: n_in = n_in[0]
-        if i == 0 and use_bn: layers += [nn.BatchNorm1d(n_in)]
-
-        layers += bn_drop_lin(n_in, n_out, bn=use_bn and i != 0, p=dp, actn=act)
-
-    if final_activation_function is not None: layers += [final_activation_function()]
-    return nn.Sequential(*layers)
-
-
 def get_next_conv_shape(c_w, c_h, stride, kernel_size):
     h = floor((c_h - kernel_size - 2) / stride) + 1 # 3 convolutional layers given (3c, 640w, 640h)
     w = floor((c_w - kernel_size - 2) / stride) + 1
     return h, w
 
 
 def get_conv(input_tuple, act, kernel_size, stride, n_conv_layers, layers):
-    """
+    r"""
     Useful guideline for convolutional net shape change:
 
     Shape:
@@ -141,37 +106,12 @@ def get_conv(input_tuple, act, kernel_size, stride, n_conv_layers, layers):
         n_conv_layers:
         layers:
     """
-    h, w = input_tuple[0], input_tuple[1]
+    h, w = input_tuple[1], input_tuple[2]
     conv_layers = [SwapImageChannel()]
     for i in range(n_conv_layers):
         h, w = get_next_conv_shape(h, w, stride, kernel_size)
-        conv_layers.append(torch.nn.Conv2d(input_tuple[2], 3, kernel_size=kernel_size, stride=stride))
+        conv_layers.append(torch.nn.Conv2d(input_tuple[3], 3, kernel_size=kernel_size, stride=stride))
         conv_layers.append(act)
-    return layers + conv_layers, 3 * (h + 1) * (w + 1)
-
 
-def create_cnn_model(layer_list: list, action_size, state_size, use_bn=False, kernel_size=5, stride=3, n_conv_layers=3,
-                     activation_function=None, final_activation_function=None, action_val_to_dim=True):
-    """Generates an nn module.
-
-    Notes:
-        TabularModel could possibly be used along side a cnn learner instead. Will be a good idea to investigate.
-
-    Returns:
-
-    """
-    act = nn.LeakyReLU if activation_function is None else activation_function
-    # For now keep drop out as 0, test including dropout later
-    ps = [0] * len(layer_list)
-    action_size = action_size[0] if not action_val_to_dim else action_size[1]
-    sizes = [state_size[0]] + layer_list + [action_size]
-    actns = [act() for _ in range(n_conv_layers + len(sizes) - 2)] + [None]
-    layers = []
-    for i, (n_in, n_out, dp, act) in enumerate(zip(sizes[:-1], sizes[1:], [0.] + ps, actns)):
-        if type(n_in) == tuple:
-            layers, n_in = get_conv(n_in, act, kernel_size, n_conv_layers=n_conv_layers, layers=layers, stride=stride)
-            layers += [Flatten()]
-
-        layers += bn_drop_lin(n_in, n_out, bn=use_bn and i != 0, p=dp, actn=act)
-    if final_activation_function is not None: layers += [final_activation_function()]
-    return nn.Sequential(*layers)
+    output_size = torch.prod(torch.tensor(nn.Sequential(*(layers + conv_layers))(torch.rand(input_tuple)).shape))
+    return layers + conv_layers, output_size