Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DIAMBRA Bonus Unit #540

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Add DIAMBRA Bonus Unit #540

wants to merge 20 commits into from

Conversation

alexpalms
Copy link
Contributor

No description provided.

@alexpalms alexpalms marked this pull request as draft June 13, 2024 12:41
Copy link
Member

@simoninithomas simoninithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this new Unit, I would need the toctree change to be able to visualize the course version and being able to make a complete review.

units/en/_toctree.yml Outdated Show resolved Hide resolved
@@ -1,11 +1,64 @@
# Introduction
# DIAMBRA Arena Overview

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an introduction.

Welcome to this new bonus unit where you'll learn to use DIambra and train agents to play

At the end of the unit you'll get

illustreation of agent playing

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.


Sound fun? Let's get started 🔥,
All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side).
All environments are **episodic Reinforcement Learning tasks**, with **discrete actions** (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM states like characters health bars or characters stage side).


Interfaced games have been selected among the most popular fighting retro-games. While sharing the same fundamental mechanics, they provide different challenges, with specific features such as different type and number of characters, how to perform combos, health bars recharging, etc. Whenever possible, games are released with all hidden/bonus characters unlocked.

In this unit we will focus on Street Fighter III, but other historic games are also available, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this unit we will focus on Street Fighter III, but other historic games are also available, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles.
In this unit **we will focus on Street Fighter III, but other historic games are also available**, and the list will continue to grow. Switching between them is very straightforward, so at the end of the unit you will be able to easily target additional titles.


## Preliminary Steps: Download Game ROM(s) and Check Validity

After completing the installation, you will need to obtain the Game ROM(s) of your interest and check their validity according to the following steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can't provide link to find them ?


### Interaction Basics

DIAMBRA Arena Environments usage follows the standard RL interaction framework: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DIAMBRA Arena Environments usage follows the standard RL interaction framework: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow.
DIAMBRA Arena Environments usage **follows the standard RL interaction framework**: the agent sends an action to the environment, which process it and performs a transition accordingly, from the starting state to the new state, returning the observation and the reward to the agent to close the interaction loop. The figure below shows this typical interaction scheme and data flow.


The default reward is defined as a function of characters health values so that, qualitatively, damage suffered by the agent corresponds to a negative reward, and damage inflicted to the opponent corresponds to a positive reward. The quantitative, general and formal reward function definition is as follows:

R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)
\\(R_t = \sum_i^{0,N_c}\left(\bar{H_i}^{t^-} - \bar{H_i}^{t} - \left(\hat{H_i}^{t^-} - \hat{H_i}^{t}\right)\right)\\)


Where:

- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
- \\(\bar{H} and \hat{H}\\) are health values for opponent’s character(s) and agent’s one(s) respectively;

Where:

- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
- t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);
- \\(t^-\\) and \\(t\\) are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);


- \bar{H} and \hat{H} are health values for opponent’s character(s) and agent’s one(s) respectively;
- t^- and t are used to indicate conditions at ”state-time” and at ”new state-time” (i.e. before and after environment step);
- N_c is the number of characters taking part in a round. Usually is N_c = 1 but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus N_c = 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- N_c is the number of characters taking part in a round. Usually is N_c = 1 but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus N_c = 2);
- \\(N_c\\) is the number of characters taking part in a round. Usually is \\(N_c = 1\\) but there are some games where multiple characters are used, with the additional possible option of alternating them during gameplay, like Tekken Tag Tournament where 2 characters have to be selected and two opponents are faced every round (thus \\(N_c = 2\\) );


DIAMBRA Arena comes with a large number of ready-to-use wrappers and examples showing how to apply them. They cover a wide spectrum of use cases, and also provide reference templates to develop custom ones.

Environmet wrappers are widely used to tweak the observation and action spaces. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Environmet wrappers are widely used to tweak the observation and action spaces. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block.
Environment wrappers are **widely used to tweak the observation and action spaces**. In order to activate them, one needs to properly set the `WrapperSettings` class attributes and provide them as input to the environment creation method, as shown in the next code block.


<img src="https://github.com/diambra/agents/blob/main/img/agents.jpg?raw=true" alt="DIAMBRA Agents"/>

For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. There are multiple advantages in doing so: these libraries provide high quality algorithms, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning, they provide native solutions to parallelize environment execution, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries. There are multiple advantages in doing so: these libraries provide high quality algorithms, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning, they provide native solutions to parallelize environment execution, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts.
For training our model, we will rely on already implemented RL algorithms, leveraging state-of-the-art Reinforcement Learning libraries.
There are multiple advantages in doing so:
- These libraries provide **high quality algorithms**, efficiently implemented and continuously tested, they allow to focus efforts on higher level aspects such as policy network architecture, features selection, and hyper-parameters tuning.
- They provide **native solutions to parallelize environment execution**, and, in some cases, they even support distributed training using multiple GPUs in a single workstation or in cluster contexts.


### Getting Ready

We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We highly recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies. In what follows we use Conda but any other tool should work too.
We highly **recommend using virtual environments to isolate your python installs, especially to avoid conflicts in dependencies**. In what follows we use Conda but any other tool should work too.


DIAMBRA Competition Platform allows you to submit your agents and compete with other coders around the globe in epic video games tournaments!

It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel.
It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. **Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel**.


DIAMBRA Competition Platform allows you to submit your agents and compete with other coders around the globe in epic video games tournaments!

It features a public global leaderboard where users are ranked by the best score achieved by their agents in the different environments. It also offers you the possibility to unlock cool achievements depending on the performances of your agent. Submitted agents are evaluated and their episodes are streamed on DIAMBRA Twitch channel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Twitch channel link


This will automatically retrieve the Hugging Face token you saved earlier and will create a new submission on [DIAMBRA Competition Platform](https://diambra.ai).

You will be able to see it on your dashboard just logging in with your credentials, and watch it being streamed on Twitch!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add link of the twitch channel

Copy link
Member

@simoninithomas simoninithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your work, I added some update, request changes and advice 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants