Skip to content

Commit

Permalink
Version 0_8_0: Stable (#9)
Browse files Browse the repository at this point in the history
* Init new branch

* Added:

* Action object: For validating action sizes / dims, bundling important info

* State: For validating state sizes / dims, bundling important info

* Bounds: For determining the dtypes of its parent object, and determining if the object is discrete

* Initial mass refactor of MDPDataset

* Max Step expectation

* Fixed:

* MDPDataset episode iteration

Notes:

* Plan to add a generic MDPStep list validation function. A few things
to expect:
- There should never be 2 "done" steps in a row
- Right after a done step, the step counter should show 0.
- There should never be Nones in the state values.
- Need to check for bad copies

* Added:

* WrapperLossFunc for compatibility with the existing fastai fit function

* native fastai fit function compatibility :))))))))

* Single Learner that subclasses fastai Learner

Notes:

* Next Commit will have the old code removed

* Fixed:

* Memory handler. For now keeping k top. Less confusing implementation.

* Removed old code

Notes:

* Need to now revise the DQN's and DDPG's to be compatible.

* Fixed:

* DDPG compat

* Other DQN compat

Removed:

* nn and cnn generic functions for now. They were just way too
confusing :( . The DDPG works now also.

* Interpreter test code.

Notes:

* Interpreter is broken for now

* Redo Tests
  • Loading branch information
josiahls authored Oct 26, 2019
1 parent bebda8e commit bc412b7
Show file tree
Hide file tree
Showing 24 changed files with 1,040 additions and 1,233 deletions.
41 changes: 6 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,41 +70,10 @@ known environments. Prior to 1.0.0, new changes might break previous code versio
working at their best. Post 1.0.0 will be more formal feature development with CI, unit testing, etc.

**Critical**
- [X] 0.0.0 MDPDataBunch: Finished to the point of being useful. Please reference: `tests/test_Envs`
Example:
```python
from fast_rl.core.Envs import Envs
from fast_rl.core.MarkovDecisionProcess import MDPDataBunch

# At present will try to load OpenAI, box2d, pybullet, atari, maze.
# note "get_all_latest_envs" has a key inclusion and exclusion so if you don't have some of these envs installed,
# you can avoid this here. Certain envs just flat out do not work / are unusual. You are welcome to see how to get them
# working.
for env in Envs.get_all_latest_envs():
max_steps = 50 # Limit the number of per episode iterations for now.
print(f'Testing {env}')
mdp_databunch = MDPDataBunch.from_env(env, max_steps=max_steps, num_workers=0)
if mdp_databunch is None:
print(f'Env {env} is probably Mujoco... Add imports if you want and try on your own. Don\'t like '
f'proprietary engines like this. If you have any issues, feel free to make a PR!')
else:
epochs = 1 # N episodes to run
for epoch in range(epochs):
for state in mdp_databunch.train_dl:
# Instead of random action, you would have your agent here
mdp_databunch.train_ds.actions = mdp_databunch.train_ds.get_random_action()

for state in mdp_databunch.valid_dl:
# Instead of random action, you would have your agent here and have exploration to 0
mdp_databunch.valid_ds.actions = mdp_databunch.valid_ds.get_random_action()
```
- [X] 0.1.0 DQN Agent: Reference `tests/test_Learner/test_basic_dqn_model_maze`. We use Learner callbacks for
handling the different fit behaviors.

Testable code:
```python
from fast_rl.agents.DQN import DQN
from fast_rl.core.Learner import AgentLearner
from fast_rl.core.basic_train import AgentLearner
from fast_rl.core.MarkovDecisionProcess import MDPDataBunch

data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
Expand All @@ -130,13 +99,15 @@ Usage example:
```python
from fast_rl.agents.DQN import DQN
from fast_rl.core.Interpreter import AgentInterpretationAlpha
from fast_rl.core.Learner import AgentLearner
from fast_rl.core.basic_train import AgentLearner
from fast_rl.core.MarkovDecisionProcess import MDPDataBunch

data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
model = DQN(data)
learn = AgentLearner(data, model)
learn.fit(10)

# Note that the Interpretation is broken, will be fixed with documentation in 0.9
interp = AgentInterpretationAlpha(learn)
interp.plot_heatmapped_episode(5)
```
Expand Down Expand Up @@ -229,8 +200,8 @@ learn.fit(5)
reset commit

- [X] 0.7.0 Full test suite using multi-processing. Connect to CI.
- [ ] **Working On** 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
- [ ] 0.9.0 Notebook demonstrations of basic model usage.
- [X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
- [ ] **Working on** 0.9.0 Notebook demonstrations of basic model usage.
- [ ] **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At
this point, all models should have guaranteed environments they should succeed in.
- [ ] 1.2.0 Add PyBullet Fetch Environments
Expand Down
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ steps:
displayName: 'Install Python Packages'

- script: |
xvfb-run -s "-screen 0 1400x900x24" pytest -n 8 fast_rl/tests --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
xvfb-run -s "-screen 0 1400x900x24" pytest -n 2 fast_rl/tests --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
displayName: 'Test with pytest'

- task: PublishTestResults@2
Expand Down
10 changes: 5 additions & 5 deletions docs_src/rl.agents.dqnfixedtarget.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@
"import fast_rl.agents.DQN \n",
"from fast_rl.agents.DQN import DQN, FixedTargetDQN, DoubleDQN, DuelingDQN, DoubleDuelingDQN\n",
"from fast_rl.core.Interpreter import AgentInterpretationAlpha\n",
"from fast_rl.core.Learner import AgentLearner\n",
"from fast_rl.core.MarkovDecisionProcess import MDPDataBunch\n",
"from fast_rl.core.Learner import AgentLearnerAlpha\n",
"from fast_rl.core.MarkovDecisionProcess import MDPDataBunchAlpha\n",
"from fast_rl.core.agent_core import PriorityExperienceReplay, ExperienceReplay\n",
"from fast_rl.core.MarkovDecisionProcess import MDPDataBunch, FEED_TYPE_IMAGE, FEED_TYPE_STATE\n",
"from fast_rl.core.MarkovDecisionProcess import MDPDataBunchAlpha, FEED_TYPE_IMAGE, FEED_TYPE_STATE\n",
"from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon\n",
"import sys\n",
"import importlib"
Expand Down Expand Up @@ -423,12 +423,12 @@
}
],
"source": [
"data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human', max_steps=1000)\n",
"data = MDPDataBunchAlpha.from_env('maze-random-5x5-v0', render='human', max_steps=1000)\n",
"model = FixedTargetDQN(data, batch_size=128, max_episodes=50, lr=0.001, copy_over_frequency=3,\n",
" memory=ExperienceReplay(10000), discount=0.99, \n",
" exploration_strategy=GreedyEpsilon(epsilon_start=1, epsilon_end=0.1,\n",
" decay=0.001, do_exploration=True))\n",
"learn = AgentLearner(data, model)\n",
"learn = AgentLearnerAlpha(data, model)\n",
"\n",
"learn.fit(50)"
]
Expand Down
12 changes: 6 additions & 6 deletions docs_src/rl.core.mdp_interpreter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,14 @@
"import numpy as np\n",
"\n",
"from fast_rl.agents.DQN import DQN\n",
"from fast_rl.core.Learner import AgentLearner\n",
"from fast_rl.core.MarkovDecisionProcess import MDPDataBunch, MDPDataset\n",
"from fast_rl.core.Learner import AgentLearnerAlpha\n",
"from fast_rl.core.MarkovDecisionProcess import MDPDataBunchAlpha, MDPDatasetAlpha\n",
"from fast_rl.core.Interpreter import AgentInterpretationAlpha\n",
"%matplotlib inline\n",
" \n",
"data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64)\n",
"data = MDPDataBunchAlpha.from_env('CartPole-v1', render='human', bs=64)\n",
"model = DQN(data)\n",
"learn = AgentLearner(data, model)\n",
"learn = AgentLearnerAlpha(data, model)\n",
"\n",
"learn.fit(5)"
]
Expand All @@ -106,9 +106,9 @@
"metadata": {},
"outputs": [],
"source": [
"data = MDPDataBunch.from_pickle('CartPole-v1', render='human', bs=64)\n",
"data = MDPDataBunchAlpha.from_pickle('CartPole-v1', render='human', bs=64)\n",
"model = DQN(data)\n",
"learn = AgentLearner(data, model)"
"learn = AgentLearnerAlpha(data, model)"
]
},
{
Expand Down
78 changes: 9 additions & 69 deletions fast_rl/agents/BaseAgent.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,12 @@
from math import floor
from typing import Collection

import gym
import numpy as np
import torch
from fastai.basic_train import LearnerCallback, Any
from fastai.callback import Callback
from fastai.basic_train import LearnerCallback
from fastai.layers import bn_drop_lin
from gym.spaces import Discrete, Box
from torch import nn
from traitlets import List
import numpy as np
from typing import Collection

from fast_rl.core.MarkovDecisionProcess import MDPDataBunch
from fast_rl.core.agent_core import ExplorationStrategy
Expand All @@ -29,6 +26,7 @@ def __init__(self, data: MDPDataBunch):
self.loss = None
self.out = None
self.opt = None
self.warming_up = False
self.learner_callbacks = [] # type: Collection[LearnerCallback]
# Root model that will be accessed for action decisions
self.action_model = None # type: nn.Module
Expand Down Expand Up @@ -77,47 +75,14 @@ def forward(self, x):
return x.view(x.size(0), -1)




def create_nn_model(layer_list: list, action_size, state_size, use_bn=False, use_embed=False,
activation_function=None, final_activation_function=None, action_val_to_dim=True):
"""Generates an nn module.
Notes:
TabularModel could possibly be used along side a cnn learner instead. Will be a good idea to investigate.
Returns:
"""
act = nn.LeakyReLU if activation_function is None else activation_function
# For now the dimension of the action does not make a difference.
action_size = action_size[0] if not action_val_to_dim else action_size[1]
# For now keep drop out as 0, test including dropout later
ps = [0] * len(layer_list)
sizes = [state_size] + layer_list + [action_size]
actns = [act() for _ in range(len(sizes) - 2)] + [None]
layers = []
for i, (n_in, n_out, dp, act) in enumerate(zip(sizes[:-1], sizes[1:], [0.] + ps, actns)):
if i == 0 and use_embed:
embedded, n_in = get_embedded(n_in[0], n_out, n_in[1], 5)
layers += [ToLong(), embedded, Flatten()]
elif i == 0: n_in = n_in[0]
if i == 0 and use_bn: layers += [nn.BatchNorm1d(n_in)]

layers += bn_drop_lin(n_in, n_out, bn=use_bn and i != 0, p=dp, actn=act)

if final_activation_function is not None: layers += [final_activation_function()]
return nn.Sequential(*layers)


def get_next_conv_shape(c_w, c_h, stride, kernel_size):
h = floor((c_h - kernel_size - 2) / stride) + 1 # 3 convolutional layers given (3c, 640w, 640h)
w = floor((c_w - kernel_size - 2) / stride) + 1
return h, w


def get_conv(input_tuple, act, kernel_size, stride, n_conv_layers, layers):
"""
r"""
Useful guideline for convolutional net shape change:
Shape:
Expand All @@ -141,37 +106,12 @@ def get_conv(input_tuple, act, kernel_size, stride, n_conv_layers, layers):
n_conv_layers:
layers:
"""
h, w = input_tuple[0], input_tuple[1]
h, w = input_tuple[1], input_tuple[2]
conv_layers = [SwapImageChannel()]
for i in range(n_conv_layers):
h, w = get_next_conv_shape(h, w, stride, kernel_size)
conv_layers.append(torch.nn.Conv2d(input_tuple[2], 3, kernel_size=kernel_size, stride=stride))
conv_layers.append(torch.nn.Conv2d(input_tuple[3], 3, kernel_size=kernel_size, stride=stride))
conv_layers.append(act)
return layers + conv_layers, 3 * (h + 1) * (w + 1)


def create_cnn_model(layer_list: list, action_size, state_size, use_bn=False, kernel_size=5, stride=3, n_conv_layers=3,
activation_function=None, final_activation_function=None, action_val_to_dim=True):
"""Generates an nn module.
Notes:
TabularModel could possibly be used along side a cnn learner instead. Will be a good idea to investigate.
Returns:
"""
act = nn.LeakyReLU if activation_function is None else activation_function
# For now keep drop out as 0, test including dropout later
ps = [0] * len(layer_list)
action_size = action_size[0] if not action_val_to_dim else action_size[1]
sizes = [state_size[0]] + layer_list + [action_size]
actns = [act() for _ in range(n_conv_layers + len(sizes) - 2)] + [None]
layers = []
for i, (n_in, n_out, dp, act) in enumerate(zip(sizes[:-1], sizes[1:], [0.] + ps, actns)):
if type(n_in) == tuple:
layers, n_in = get_conv(n_in, act, kernel_size, n_conv_layers=n_conv_layers, layers=layers, stride=stride)
layers += [Flatten()]

layers += bn_drop_lin(n_in, n_out, bn=use_bn and i != 0, p=dp, actn=act)
if final_activation_function is not None: layers += [final_activation_function()]
return nn.Sequential(*layers)
output_size = torch.prod(torch.tensor(nn.Sequential(*(layers + conv_layers))(torch.rand(input_tuple)).shape))
return layers + conv_layers, output_size
Loading

0 comments on commit bc412b7

Please sign in to comment.