Skip to content

Latest commit

 

History

History
41 lines (30 loc) · 3.34 KB

README.md

File metadata and controls

41 lines (30 loc) · 3.34 KB

Reinforcement Learning for load distribution in decentralized Edge environment

Paper: https://dl.acm.org/doi/10.1145/3660319.3660331

Description

The project proposes the implementation of SAC (Soft actor-critic) and PPO (Proximal Policy Optimization) deep reinforcement learning algorithms and of the evolutionary algorithm NEAT (Neuro Evolution of Augmenting Topologies) to optimize workload management in an Edge Computing system (DFaaS). The goal is to find the optimal policy for local processing, forwarding of requests to edge nodes, and rejection of requests based on system conditions. The current implementation still has simplifying assumptions compared to the real scenario.

In the simulated environment, the agent receives a sequence of incoming requests over time. At each step, it must decide how many of these requests to process locally, how many to forward to another edge node, and/or how many to reject. The number of incoming requests varies over time.

The action space is a three-dimensional continuous box where each dimension corresponds to the proportions of requests that are processed locally, forwarded, or rejected.

The observation space consists of four components:

  • The number of incoming requests
  • The remaining queue capacity
  • The remaining forward capacity
  • A congestion flag, indicating whether the queue is congested

The reward function in this environment depends on the actions taken by the agent and the system state. The reward function provides more points for processing requests locally and fewer points for forwarding requests. It penalizes the system heavily for rejecting requests and for causing congestion in the queue.

Training and test settings

Three different training scenarios were defined, distinguished by the different way of generating requests to be processed and the different way of updating the available forwarding capacity to other nodes.

  • Scenario 1 scenario_1
  • Scenario 2 scneario_2
  • Scenario 3 scenario_3

The idea is to evaluate the results obtained according to different work contexts. Different scenarios allow us to assess the generalization capabilities of the algorithms by evaluating the performance obtained in work scenarios other than the training scenario (overfitting evaluation).

Best experiment results

The highest reward scores and best generalization abilities have been achieved by PPO with standard hyperparameters, trained in scenario 2.

  • Results achieved by testing ppo (trained in scenario 2) in scenario 3 s2s3_reward s2s3_rejected

  • Results achieved by testing ppo (trained in scenario 2) in scenario 1 s2s1_reward s2s1_rejected