Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An implementation of CC-POMCP #68

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

An implementation of CC-POMCP #68

wants to merge 32 commits into from

Conversation

troiwill
Copy link

This pull request implements the cost-constrained POMCP (CC-POMCP) algorithm and its dependencies. The algorithm is in algorithms/ccpomcp.p*, while the dependencies are in the framework/generalization.p* and utils/cvec.p* files. This pull request also proposes a generic model, called a ResponseModel, and a corresponding output, called a Response. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.

The framework/generalization.p* files contain

The pull request has the following:

  1. Implementation of the CCPOMCP and its dependencies,
  2. Some updates to pre-existing algorithms to reduce copied-and-pasted code,
  3. Some updates to the code for rocksample,
  4. Implementation of the rocksample from the CC-POMCP paper,
  5. Test script for Vector operations (in test_util_vector_ops.py), and
  6. Passes for the test_all.py script.

troiwill added 30 commits March 27, 2024 17:31

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@zkytony zkytony self-requested a review April 19, 2024 12:35
@zkytony
Copy link
Collaborator

zkytony commented Apr 19, 2024

Thanks for the effort. Much better organized than before. Will take a pass soon. One thing - the CIs are failing. Could you fix this first? I just merged this PR that will make CI actions trigger when you update the source branch of this PR. Please update your branch with the latest changes in main.

@troiwill
Copy link
Author

troiwill commented May 6, 2024

I merged the changes from main into ccpomcp. Let me know if I need to update anything.

@zkytony
Copy link
Collaborator

zkytony commented May 20, 2024

Great. Looks like CIs passed except for pre-commit. I will review the code soon. I should also enable CIs for all branches... It should be enabled, after you rebase/merge latest changes in main. @troiwill

Copy link
Collaborator

@zkytony zkytony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the super delay... I was planning to test this out locally and merge if there's no issue. But when I ran python -m pomdp_py -r ccrocksample, the program failed due to particle deprivation - I think you should have at least one working example of your algorithm. I was curious what advantage or difference CC-POMCP has over POMCP but this is not clear to me. I hope that your example could make that clear. Thanks.

*** Testing CC-POMCP ***
==== Step 1 ====
True state: State((0, 3) | ('good', 'good', 'bad', 'bad', 'good', 'good', 'good', 'good') | False)
Action: check-5
Observation: good
Response: reward=0, cost=1
Response (Cumulative): reward=0, cost=1
Response (Cumulative Discounted): reward=0, cost=1
__num_sims__: 10000
__plan_time__: 12.15742
World:

______ID______
4......>
7....0.>
3..2...>
R......>
..6....>
1......>
.....5.>
_____G/B_____
$......>
$....$.>
x..x...>
R......>
..$....>
$......>
.....$.>

==== Step 2 ====
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/kzheng/repo/pomdp-py/pomdp_py/__main__.py", line 51, in <module>
    main()
  File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 215, in main
    total_response, total_discounted_response = test_planner(
  File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 158, in test_planner
    ccpomcp.update(cc_rocksample.agent, action, real_observation)
  File "pomdp_py/algorithms/pomcp.pyx", line 96, in pomdp_py.algorithms.pomcp.POMCP.update
    cpdef update(self, Agent agent, Action real_action, Observation real_observation,
  File "pomdp_py/algorithms/pomcp.pyx", line 120, in pomdp_py.algorithms.pomcp.POMCP.update
    agent.set_belief(particle_reinvigoration(tree_belief,
  File "pomdp_py/representations/belief/particles.pyx", line 29, in pomdp_py.representations.belief.particles.particle_reinvigoration
    raise ValueError("Particle deprivation.")
ValueError: Particle deprivation.

Also, I noticed the added unit test for cvec; That's good. Could you add a test that involves calling CC-POMCP? It's true that such tests aren't there yet even for existing algorithms, but it'd be good for new algorithms to have tests still.

Left other comments, mostly minor and docstring-related. I hope the delay here didn't delay the progress of your work. Thanks again for the contribution!

cdef void vector_copy(double[:] src, double[:] dst)


cdef class Vector:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring that describes what this is for?

Option
)
from typing import Optional

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of generalization? Please add docstring

raise NotImplementedError


cdef class ResponseModel:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does "response" differ from "observation" or "reward"? Please clarify. Please include relevant references since this is not common terminology. If it is a wrapper for "reward", why is this wrapper necessary?

Perhaps, it is best to point out the gist of CC-POMCP: Maximize cumulative reward while constraining cumulative cost.

print("Observation: %s" % str(real_observation))
print("Response: %s" % str(env_response))
print("Response (Cumulative): %s" % str(total_response))
print("Response (Cumulative Discounted): %s" % str(total_discounted_response))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To showcase the benefit of CC-POMCP, you should indicate what the cost constraint is, and demonstrate that the constraint isn't crossed.

State,
)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be really nice to have some other domains; Rocksample is really old and kind of detached from reality. If cost-constraint is really useful it should find applications in other more realistic domains.

@@ -295,6 +295,20 @@ cdef class POUCT(Planner):
"""
self._rollout_policy = rollout_policy

cpdef QNode _create_qnode(
self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm. Please add docstring though; Looks like it's necessary to create QNode with parameters that differ from default ones e.g. self._num_visits_init in CC-POMCP. Why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants