Starting to transfer custom algos

Stanford-ILIAD · Oct 7, 2023 · 8c15608 · 8c15608
1 parent 60e673f
commit 8c15608
Show file tree

Hide file tree

Showing 18 changed files with 1,349 additions and 696 deletions.
diff --git a/pyproject.toml b/pyproject.toml
@@ -33,4 +33,5 @@ dependencies = [
 "Bug Tracker" = "https://github.com/Stanford-ILIAD/PantheonRL/issues"
 
 [tool.pylint]
-disable = ["protected-access", "too-many-arguments", "too-many-instance-attributes", "too-many-statements", "too-many-branches"]
+disable = ["protected-access", "too-many-arguments", "too-many-instance-attributes", "too-many-statements", "too-many-branches", "too-many-locals", "duplicate-code"]
+generated-members = ["numpy.*", "torch.*"]
diff --git a/src/pantheonrl/__init__.py b/src/pantheonrl/__init__.py
@@ -1,22 +1,33 @@
 """
-`PantheonRL <https://github.com/Stanford-ILIAD/PantheonRL>`_ is a package for training and testing multi-agent reinforcement learning environments. The goal of PantheonRL is to provide a modular and extensible framework for training agent policies, fine-tuning agent policies, ad-hoc pairing of agents, and more.
+`PantheonRL <https://github.com/Stanford-ILIAD/PantheonRL>`_ is a
+package for training and testing multi-agent reinforcement learning
+environments. The goal of PantheonRL is to provide a modular and
+extensible framework for training agent policies, fine-tuning agent
+policies, ad-hoc pairing of agents, and more.
 
-PantheonRL is built to support Stable-Baselines3 (SB3), allowing direct access to many of SB3's standard RL training algorithms such as PPO. PantheonRL currently follows a decentralized training paradigm -- each agent is equipped with its own replay buffer and update algorithm. The agents objects are designed to be easily manipulable. They can be saved, loaded and plugged into different training procedures such as self-play, ad-hoc / cross-play, round-robin training, or finetuning.
+PantheonRL is built to support Stable-Baselines3 (SB3), allowing
+direct access to many of SB3's standard RL training algorithms such as
+PPO. PantheonRL currently follows a decentralized training paradigm --
+each agent is equipped with its own replay buffer and update
+algorithm. The agents objects are designed to be easily
+manipulable. They can be saved, loaded and plugged into different
+training procedures such as self-play, ad-hoc / cross-play,
+round-robin training, or finetuning.
 """
 import pantheonrl.envs
 
 from pantheonrl.common.agents import (
     Agent,
     StaticPolicyAgent,
     OnPolicyAgent,
-    OffPolicyAgent
+    OffPolicyAgent,
 )
 
 from pantheonrl.common.multiagentenv import (
     DummyEnv,
     MultiAgentEnv,
     TurnBasedEnv,
-    SimultaneousEnv
+    SimultaneousEnv,
 )
 
 from pantheonrl.common.observation import Observation