PASSIVE-ADP-AGENT

AIMA3e

function Passive-ADP-Agent(percept) returns and action
inputs: percept, a percept indication the current state s' and reward signal r'
persistent: π, a fixed policy
mdp, an MDP with model P, rewards R, discount γ
U, a table of utilities, initially empty
N_sa, a table of frequencies for state-action pairs, initially zero
N_s'|sa, a table of outcome frequencies given state-action pairs, initially zero
s, a, the previous state and action, initially null
if s' is new then U[s'] ← r'; R[s'] ← r'
if s is not null then
increment N_sa[s, a] and N_s'|sa[s', s, a]
for each t such that N_s'|sa[t, s, a] is nonzero do
P(t | s, a) ← N_s'|sa[t, s, a] / N_sa[s, a]
U ← Policy-Evaluation(π, U, mdp)
if s'.Terminal? then s, a ← null else s, a ← s', π[s']

Figure ?? A passive reinforcement learning agent based on adaptive dynamic programming. The Policy-Evaluation function solves the fixed-policy Bellman equations, as described on page ??.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passive-ADP-Agent.md

Passive-ADP-Agent.md

PASSIVE-ADP-AGENT

AIMA3e

Files

Passive-ADP-Agent.md

Latest commit

History

Passive-ADP-Agent.md

File metadata and controls

PASSIVE-ADP-AGENT

AIMA3e