Skip to content

Application of deep reinforcement learning (DQN and PPO) for automated trading on HPC system, comparing performance across CPU/GPU nodes

License

Notifications You must be signed in to change notification settings

nabilshadman/deep-reinforcement-learning-automated-trading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning for Automated Trading on HPC

License Python PyTorch HPC

This repository contains the implementation and evaluation of Deep Reinforcement Learning (DRL) models for automated trading on High-Performance Computing (HPC) system. The project focuses on two widely used DRL algorithms: Deep Q-Network (DQN) and Proximal Policy Optimisation (PPO). Our goal is to compare these models in terms of both trading performance and computational efficiency across CPU and GPU nodes on the Cirrus HPC system.

Key Features:

  • Implementation of DQN and PPO algorithms for multi-asset trading
  • Integration with HPC resources for scalable training and testing
  • Performance evaluation on CPU and GPU nodes
  • Comprehensive analysis of trading metrics and computational efficiency

This project is part of an MSc dissertation titled "Comparison of Deep Reinforcement Learning Models for Automated Trading on Heterogeneous HPC System".

Deep Q-Network (DQN) Proximal Policy Optimisation (PPO)
DQN Architecture PPO Architecture
DQN architecture [1] PPO architecture [2]

Table of Contents

Repository Structure

Here is a high-level overview of the repository's structure:

/
├── code/
│   ├── data/
│   ├── dqn/
│   ├── ppo/
│   ├── test/
│   ├── torchrl_dqn/
│   └── torchrl_ppo/
├── experiments/
├── feasibility/
├── report/
├── environment.yml
├── requirements.txt
  • code/ – Contains all code-related directories for the project.
    • data/ – Scripts to download and store datasets used in experiments (e.g., equity prices).
    • dqn/ – Scripts and implementation for the DQN model.
    • ppo/ – Scripts and implementation for the PPO model.
    • test/ – Test scripts to validate environments, agents, and other components.
    • torchrl_dqn/ – Contains DQN implementation using the TorchRL framework (in progress).
    • torchrl_ppo/ – Contains PPO implementation using the TorchRL framework (in progress).
  • experiments/ – Contains Excel files and charts documenting the results of experiments, including baseline comparisons, hyperparameter tuning, scaling tests, and transferability.
  • feasibility/ – Documents related to the feasibility study conducted in the initial stage of the project.
  • report/ – Includes project reports and presentations (PDF format).
  • environment.yml – Conda environment configuration file for setting up project dependencies.
  • requirements.txt – Lists Python packages required to run the project.

Hardware Environment

Cirrus is our primary HPC platform for testing our implementations, offering both CPU and GPU nodes to efficiently train and evaluate our DRL models.

Cirrus
Cirrus Architecture
Cirrus at EPCC's Advanced Computing Facility [3]

Software Environment

PyTorch is our primary machine learning framework for implementing DQN and PPO models. In addition to PyTorch, we explored other frameworks or libraries such as TensorFlow and TorchRL during the feasibility and prototyping phases to assess their suitability for the project. The environment is managed through Conda to ensure reproducibility across platforms.

Prerequisites

  • Python 3.10+
  • PyTorch and supporting libraries (see requirements.txt or environment.yml)
  • Conda or pip package manager
  • CUDA-capable GPU recommended
  • Cirrus HPC access credentials (for HPC usage)

How to Set Up (on Cirrus)

1. Set Up the Environment

To set up the project environment using Conda:

conda env create -f environment.yml
conda activate your-env-name

Alternatively, install Python packages using pip:

pip install -r requirements.txt

2. Run the Experiments

Use the provided SLURM scripts in the respective model's directory (i.e. DQN, PPO) to run the programs on Cirrus:

  • For CPU nodes:

    # DQN
    sbatch dqn_cpu.slurm
    
    # PPO
    sbatch ppo_cpu.slurm
  • For GPU nodes:

    # DQN
    sbatch dqn_gpu.slurm
    
    # PPO
    sbatch ppo_gpu.slurm

How to Set Up (on local machine)

To run the DQN or PPO programs on a local machine (e.g, Windows, Mac):

1. Install dependencies:

# Using conda
conda env create -f environment.yml
conda activate your-env-name

# Or using pip
pip install -r requirements.txt

2. Train the model:

From the respective model's directory:

# DQN
python dqn_trader.py -m train -c config.yaml

# PPO
python ppo_trader.py -m train -c config.yaml

3. Test the model:

From the respective model's directory:

# DQN
python dqn_trader.py -m test -c config.yaml

# PPO
python ppo_trader.py -m test -c config.yaml

Ensure any necessary configuration changes are made in the config.yaml file before running.

Configuration

The models use YAML configuration files for hyperparameters. These can be found in the respective model directories. Make sure to adjust the file if you need to change hyperparameters.

Outputs and Logs

Logs generated from SLURM jobs are stored in the logs/ folder for each model.

The models output several key metrics:

  • Portfolio value statistics (median, min, max)
  • Execution time
  • CPU/GPU memory usage

Profiling and performance metrics can be found within the log files.

Data

The project uses daily equity or exchange traded fund (ETF) close price data, publicly available from the Yahoo Finance API. The data files are located at code/data/.

Profiling

Separate SLURM script for profiling are provided in the respective model's folder. These scripts integrate profiling tools such as torch.profiler to gather performance metrics like execution time and memory usage. Use these profiling-specific scripts to enable profiling.

GPU Monitoring

For GPU runs, there's a commented-out section in the SLURM scripts to collect GPU monitoring data (e.g. power, utilisation) using nvidia-smi. Uncomment this section to enable GPU monitoring.

Open-Source Code and Enhancements

This project builds upon the open-source implementations of the DQN algorithm (developed by Lazy Programmer) and the PPO algorithm (developed by Phil Tabor). Both authors are experienced machine learning practitioners who promote experimentation with their implementations.

These algorithms were adapted and enhanced for a multi-asset trading environment and integrated with HPC resources. Some of our enhancements include GPU support, environment extensions, YAML-based configuration management, and model architecture improvements for automated trading tasks on HPC systems.

For more details on these enhancements and their impact, refer to the project report.

Project Wiki

For detailed notes, meeting summaries, experimental observations, and literature references, please refer to the Project Wiki. The wiki includes key information about the project's progress, experimental results, and resources used throughout the development process.

Contributors

Researcher: Nabil Shadman
Advisors: Dr Joseph Lee, Dr Michael Bareford

Contact

For questions or issues, feel free to contact Nabil Shadman at [email protected].

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this work in your research, please cite:

@misc{drl-automated-trading-hpc,
  author = {Shadman, Nabil},
  title = {Comparison of Deep Reinforcement Learning Models for Automated Trading on Heterogeneous HPC System},
  year = {2024},
  month = {9},
  publisher = {GitHub},
  url = {https://github.com/nabilshadman/deep-reinforcement-learning-automated-trading},
  note = {Master's Dissertation},
  institution = {The University of Edinburgh}
}

References

[1] A. Nair et al., "Massively parallel methods for deep reinforcement learning," arXiv.org, 
    https://arxiv.org/abs/1507.04296 (accessed August 24, 2024).
[2] N. Firdous, N. Mohi Ud Din, and A. Assad, "An imbalanced classification approach for 
    establishment of cause-effect relationship between Heart-Failure and Pulmonary Embolism 
    using Deep Reinforcement Learning," Engineering Applications of Artificial Intelligence,
    Sept. 2023.
[3] The University of Edinburgh, "High Performance Computing services," 
    https://www.epcc.ed.ac.uk/high-performance-computing-services (accessed September 16, 2024).