New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add DIAMBRA Bonus Unit #540

Draft

alexpalms wants to merge 20 commits into huggingface:main from alexpalms:main

Contributor

alexpalms commented Jun 13, 2024

No description provided.

alexpalms added 19 commits

June 10, 2024 21:35


          Add DIAMBRA unit

84aaa8a


          Merge branch 'huggingface:main' into main

3e4ea79


          Update

d3632df


          Merge branch 'main' of github.com:alexpalms/deep-rl-class into main

309fa17


          Update

c03565c


          Update

b67bda4


          Update

15082cc


          Update

563a47a


          Update

c701b49


          Update

9c47ce0


          Update

04db493


          Update

e1447e5


          Update

0e40ad2


          Update

099c4e5


          Update

b9219c5


          Update

07c2be4


          Update

cacea81


          Update

8e8b650


          Update

63d86b9

alexpalms marked this pull request as draft

June 13, 2024 12:41

simoninithomas requested changes

View reviewed changes

Member

simoninithomas left a comment •

edited

Loading

~~Thanks for this new Unit, I would need the toctree change to be able to visualize the course version and being able to make a complete review.~~

units/en/_toctree.yml Outdated Show resolved Hide resolved

units/en/unitbonus3/introduction.mdx

		@@ -1,11 +1,64 @@
		# Introduction
		# DIAMBRA Arena Overview

Member

simoninithomas Jun 24, 2024

I would add an introduction.

Welcome to this new bonus unit where you'll learn to use DIambra and train agents to play

At the end of the unit you'll get

illustreation of agent playing


          Update units/en/_toctree.yml

3c660f1

HuggingFaceDocBuilderDev commented Jun 24, 2024

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/introduction.mdx


		Sound fun? Let's get started 🔥,
		All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side).

Member

simoninithomas Jun 24, 2024

Suggested change

      
            All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side).
          
            All environments are **episodic Reinforcement Learning tasks**, with **discrete actions** (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side).

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/introduction.mdx


		Interfaced games have been selected among the most popular fighting retro-games. While sharing the same fundamental mechanics, they provide different challenges, with specific features such as different type and number of characters, how to perform combos, health bars recharging, etc. Whenever possible, games are released with all hidden/bonus characters unlocked.

		In this unit we will focus on Street Fighter III, but other historic games are also available, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles.

Member

simoninithomas Jun 24, 2024

Suggested change

      
            In this unit we will focus on Street Fighter III, but other historic games are also available, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles.
          
            In this unit **we will focus on Street Fighter III, but other historic games are also available**, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/introduction.mdx


		## Preliminary Steps: Download Game ROM(s) and Check Validity

		After completing the installation, you will need to obtain the Game ROM(s) of your interest and check their validity according to the following steps.

Member

simoninithomas Jun 24, 2024

I suppose we can't provide link to find them ?

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/environment.mdx


		### Interaction Basics

		DIAMBRA Arena Environments usage follows the standard RL interaction framework: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow.

Member

simoninithomas Jun 24, 2024

Suggested change

      
            DIAMBRA Arena Environments usage follows the standard RL interaction framework: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow.
          
            DIAMBRA Arena Environments usage **follows the standard RL interaction framework**: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/environment.mdx


		The default reward is defined as a function of characters health values so that, qualitatively, damage suffered by the agent corresponds to a negative reward, and damage inflicted to the opponent corresponds to a positive reward. The quantitative, general and formal reward function definition is as follows:

		R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)

Member

simoninithomas Jun 24, 2024

Suggested change

      
            R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)
          
            \\(R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)\\)

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/environment.mdx


		Where:

		- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;

Member

simoninithomas Jun 24, 2024

Suggested change

      
            - \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
          
            - \\(\bar{H} and \hat{H}\\) are health values for opponent’s character(s) and agent’s one(s) respectively;

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/environment.mdx

+              Where:
+              - \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
+              - t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);

Member

simoninithomas Jun 24, 2024

Suggested change

      
            - t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);
          
            - \\(t^-\\) and \\(t\\) are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/environment.mdx

+              - \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
+              - t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);
+              - N_c is the number of characters taking part in a round. Usually is N_c = 1 but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus N_c = 2);

Member

simoninithomas Jun 24, 2024

Suggested change

      
            - N_c is the number of characters taking part in a round. Usually is N_c = 1 but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus N_c = 2);
          
            - \\(N_c\\) is the number of characters taking part in a round. Usually is \\(N_c = 1\\) but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus \\(N_c = 2\\) );

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/environment.mdx


		DIAMBRA Arena comes with a large number of ready-to-use wrappers and examples showing how to apply them. They cover a wide spectrum of use cases, and also provide reference templates to develop custom ones.

		Environmet wrappers are widely used to tweak the observation and action spaces. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block.

Member

simoninithomas Jun 24, 2024

Suggested change

      
            Environmet wrappers are widely used to tweak the observation and action spaces. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block.
          
            Environment wrappers are **widely used to tweak the observation and action spaces**. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/training.mdx


		<img src="https://github.com/diambra/agents/blob/main/img/agents.jpg?raw=true" alt="DIAMBRA Agents"/>

		For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. There are multiple advantages in doing so: these libraries provide high quality algorithms, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning, they provide native solutions to parallelize environment execution, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts.

Member

simoninithomas Jun 24, 2024

Suggested change

      
            For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. There are multiple advantages in doing so: these libraries provide high quality algorithms, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning, they provide native solutions to parallelize environment execution, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts.
          
            For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. 
          
            There are multiple advantages in doing so: 
          
            - These libraries provide **high quality algorithms**, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning.
          
            - They provide **native solutions to parallelize environment execution**, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/training.mdx


		### Getting Ready

		We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too.

Member

simoninithomas Jun 24, 2024

Suggested change

      
            We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too.
          
            We highly **recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies**. In what follows we use Conda but any other tool should work too.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/agent-submission.mdx


		DIAMBRA Competition Platform allows you to submit your agents and compete with other coders around the globe in epic video games tournaments!

		It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel.

Member

simoninithomas Jun 24, 2024

Suggested change

      
            It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel.
          
            It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. **Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel**.

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/agent-submission.mdx


		DIAMBRA Competition Platform allows you to submit your agents and compete with other coders around the globe in epic video games tournaments!

		It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel.

Member

simoninithomas Jun 24, 2024

Add Twitch channel link

simoninithomas reviewed

View reviewed changes

units/en/unitbonus3/agent-submission.mdx


		This will automatically retrieve the Hugging Face token you saved earlier and will create a new submission on [DIAMBRA Competition Platform](https://diambra.ai).

		You will be able to see it on your dashboard just logging in with your credentials, and watch it being streamed on Twitch!

Member

simoninithomas Jun 24, 2024

Add link of the twitch channel

simoninithomas requested changes

View reviewed changes

Member

simoninithomas left a comment

Hi, thanks for your work, I added some update, request changes and advice 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet