The notebook presents the implementation of the Q-Learning and Expected SARSA algorithms to solve the Text-Flappy-Bird game
The implementation of the environment can be found here: https://gitlab-research.centralesupelec.fr/stergios.christodoulidis/text-flappy-bird-gym
Please find below the final model used to measure the performance of the agents:
Hyperparameters | Q-Learning | Expected SARSA |
---|---|---|
Step-size | 0.5 | 0.5 |
Step-size decay | 1.0 | 0.99999 |
Epsilon | 0.05 | 0.05 |
Epsilon decay | 0.99999 | 0.99999 |
Discount | 1.0 | 0.9 |
The sum of rewards achieved by both agents:
Q-Learning: 8,041,130
Expected SARSA : 36,660