DDQN to Beat Mario

Link

WandB: https://wandb.ai/arth-shukla/Mario-DDQN

Resources Used

I based the DDQN on the resource here: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf

Technologies Used

Algorithms/Concepts: Double Deep Q Networks (DDQNs) building on Deep Q Networks (DQNs)

AI Development: Pytorch (Torch, Cuda)

Evaluation and Inference

The DDQN was able to learn to play one level with fairly regular success:

Total Reward by Episode

DDQN After 13000 Episodes

Note that the DDQN does still sometimes fail, but in general it performs well.

The DDQN performs well because it uses a target network and altered loss function to avoid Q-value overestimation that plagues standard DQNs. However, it still has similar issues to DQNs of slow convergence, needing to learn experience data points multiple times to properly learn, etc.

Future Experiments

I will run this with a PPO algo I wrote earlier for Cartpole (https://github.com/arth-shukla/ppo-gym-cartpole) altered to include the same convolutions. PPO should not only be able to run cartpole well, but it should be able to learn multiple levels or The Lost Levels (a version of Mario 2 not released in the west due to its difficulty), all of which is included in the Mario Gym API.

About Me

Arth Shukla Site | GitHub | LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
select-checkpoints		select-checkpoints
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
ddqn.ipynb		ddqn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDQN to Beat Mario

Link

Resources Used

Technologies Used

Evaluation and Inference

Total Reward by Episode

DDQN After 13000 Episodes

Future Experiments

About Me

About

Releases

Packages

Languages

arth-shukla/ddqn-mario

Folders and files

Latest commit

History

Repository files navigation

DDQN to Beat Mario

Link

Resources Used

Technologies Used

Evaluation and Inference

Total Reward by Episode

DDQN After 13000 Episodes

Future Experiments

About Me

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages