-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Own POMDP model #1
Comments
Thanks @friedsela ! |
For example, taking the file 1d.POMDP from http://www.pomdp.org/examples/ gives the error
TypeError: 'NoneType' object is not iterable If I add the line File "mtrand.pyx", line 1146, in mtrand.RandomState.choice |
Hi @friedsela, I've just pushed a batch of minor fixes for addressing some issues in the POMDP parser and the utility function creation. But that didn't really solve the issue you encountered when using on the 1d.POMDP dataset. Basically, the current implementation does not yet allow for querying the action reward given (s_i, s_j, o), see line 89 in model.py. The only supported is (s_i), i.e., reward conditioned only on the current state, which was the only case of study during the development of this package.. |
So, if I understand correctly, if in my POMDP the reward depends only on s_i and a_i, then it should work? |
Yes, you are correct. The only supported POMDP reward specification is like: I couldn't remember exactly whether the parser already supports the full (s_i, s_j, o) spec... maybe it already does but it is just I chose to not consider it in the solver... |
Ok, great! I actually have my own implementation for POMCP, but as I need to solve many models, I wanted to see weather there is a faster implementation than my own. I also wanted to see weather point-based is faster than POMCP. From your experience, which one is faster? |
Hmmm I don't really think they are even comparable as they are solving different POMDPs. POMCP is designed for solving online planning problems in POMDP, whereas PBVI is for off-line learning. It also depends on the model complexity, POMCP was designed for approximately solving very large POMDPs, so the model complexity should be made explicit if a comparison is must... Previously, I didn't consider comparing their speed. |
Thanks for your reply. Fortunately in my model the rewards are only dependent on the action and the current state, so i'm able now to run your code. Could you please explain what is actually happening and what the output mean? Why do you call 'pomdp.solve(T)' in each iteration? Don't you solve the POMDP once and then use its policy? Why are the simulations numbers jumping? what do they mean? What does '30 games played. Toal reward = 10.0' mean? That my POMDP was solved and 30 games were played and the mean is 10? If the horizon is 20 and 30 games were played, what then happend? What I am looking for is to run a solver and then get a policy for which I can calculate the mean reward (by simulating many games). Is this possible with your code? |
For an off-line learning POMDP problem, you only need to solve the POMDP once to get an (maybe) optimal policy, then use that policy to decide all actions. But in an online setting (which is POMCP's case of application), you need to learn policies as you go along because the agent doesn't have access to the full POMDP specification --- kinda like walking in a maze where you could only see what's nearby...
It doesn't have to be jumping. You can have a fixed number of MCTS simulations, but I just decided to have a fixed number of total simulation time lol.
That means it has completed 30 times of the POMDP cycle: planning => action => observation => belief update.
I reckon currently it is only possible for PBVI.. |
Thank for the reply. As far as I understand, every online solver can be made offline if we let it run on a simulator until it gets to an (maybe) optimal policy and this is how I use POMCP. I guess it's not how it was meant to be applied. So suppose I want to use your PBVI solver. Hope it's fine me asking so many questions and suggesting things, it is that I think your project is very good and that the community lacks good python implementations of POMDP solvers... |
Very nice project. Thanks!
I wanted to check it on my own POMDP model but I get errors. It seems that just changing the file name doesn't work. Been spending some time trying to find the problem...
The text was updated successfully, but these errors were encountered: