Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use deterministic policies when evaluating PCN #75

Merged
merged 2 commits into from
Nov 3, 2023

Conversation

vaidas-sl
Copy link
Contributor

According to PCN paper, "at execution time — i.e., after the training process — we use a deterministic policy by systematically selecting the action with the highest confidence" however current implementation always uses sampled actions

At execution time always select an action with the highest confidence
@ffelten
Copy link
Collaborator

ffelten commented Nov 2, 2023

I'm letting Lucas review this since he's more knowledgeable about this algorithm. Makes sense to me but surely impacts performance.

@LucasAlegre
Copy link
Owner

Hi @vaidas-sl, thanks for the PR!

Although in the original PCN code (https://github.com/mathieu-reymond/pareto-conditioned-networks/blob/main/pcn/pcn.py#L186C69-L186C69) the evaluation during training uses the stochastic policy, they evaluate using a deterministic policy after training.

I think it makes sense that the evaluation results that we report are obtained using the deterministic policy, as it is the policy that matters at the end of training.

@LucasAlegre LucasAlegre changed the title PCN - at eval always select an action with the highest confidence Use deterministic policies when evaluating PCN Nov 3, 2023
@LucasAlegre LucasAlegre merged commit f57168c into LucasAlegre:main Nov 3, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants