Use deterministic policies when evaluating PCN #75

vaidas-sl · 2023-11-01T20:35:25Z

According to PCN paper, "at execution time — i.e., after the training process — we use a deterministic policy by systematically selecting the action with the highest confidence" however current implementation always uses sampled actions

At execution time always select an action with the highest confidence

ffelten · 2023-11-02T07:03:49Z

I'm letting Lucas review this since he's more knowledgeable about this algorithm. Makes sense to me but surely impacts performance.

LucasAlegre · 2023-11-03T18:38:02Z

Hi @vaidas-sl, thanks for the PR!

Although in the original PCN code (https://github.com/mathieu-reymond/pareto-conditioned-networks/blob/main/pcn/pcn.py#L186C69-L186C69) the evaluation during training uses the stochastic policy, they evaluate using a deterministic policy after training.

I think it makes sense that the evaluation results that we report are obtained using the deterministic policy, as it is the policy that matters at the end of training.

vaidas-sl added 2 commits November 1, 2023 22:30

PCN fix:

839b69b

At execution time always select an action with the highest confidence

Fix spacing

6b9f3ff

ffelten requested a review from LucasAlegre November 2, 2023 07:03

LucasAlegre changed the title ~~PCN - at eval always select an action with the highest confidence~~ Use deterministic policies when evaluating PCN Nov 3, 2023

LucasAlegre merged commit f57168c into LucasAlegre:main Nov 3, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use deterministic policies when evaluating PCN #75

Use deterministic policies when evaluating PCN #75

vaidas-sl commented Nov 1, 2023

ffelten commented Nov 2, 2023

LucasAlegre commented Nov 3, 2023

Use deterministic policies when evaluating PCN #75

Use deterministic policies when evaluating PCN #75

Conversation

vaidas-sl commented Nov 1, 2023

ffelten commented Nov 2, 2023

LucasAlegre commented Nov 3, 2023