-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-arena episodes #35
Conversation
Adds the feature of multi-arena episodes, by looking for the mergeNextArena parameter in an arena's yaml. When a good (not timeout or death) episode end is reached TrainingAgent checks TrainingArena to see if it should end the episode or merge the next arena. If the latter it calls the new TrainingArena function loadNextArena.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, you will upon inspection see that we have a duplicate code on lines 232, GetTotalArenas().
I think the original getMaxArenaID() had or meant to have something like this:
public int getMaxArenaID{
return _arenasConfigurations.configurations.sum;
}
which would return an integer of total arenas defined in the config. An argument would be presented here that both .sum and .count would technically return the same number as both are returning the number of arenas defined. However, having two seperate methods ensures encapsulation and allows for flexibility.
i think we should revert this back to the original name and change .count to .sum, which would then serve its purpose. I can do this on my end but in the meantime lets change the method name back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all in all a solid approach. But i think a few changes are to be made and some code to be reverted for effectiveness.
@@ -141,6 +141,7 @@ public class ArenaConfiguration | |||
public bool toUpdate = false; | |||
public string protoString = ""; | |||
public int randomSeed = 0; | |||
public bool mergeNextArena = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having this boolean applied locally (per arena definition in the config file) would be inferior to having it applied globally, like the canChangePerspective boolean. It would make sense to set this boolean once (TRUE = ALL ARENAS ARE MERGED / FALSE = NO MERGEDARENAS) and only once at the top of the config file which would then apply to all arenas, rather than specifiying it for each arena.
However, if the point of this parameter is to have the flexibility of choosing exactly which and what arenas are to be merged, then this makes sense having it set locally.
Can you elaborate on the actual requirements in this regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second of your points re flexibility is what's important for this change.
An example use case for this feature is to have a maze that the agent navigates several times during one episode. For example the first time might be to learn the maze structure and then the second time we've added a block, so that the most efficient route is no longer possible and the agent has to use its learning from the previous run to reroute. For this example if we had 3 mazes we wanted to test on, we would have 6 arenas we want to merge in pairs (0-1, 2-3, 4-5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, perhaps an elaboration could be made on the actual and most probable use case for this feature, in the docs. Maybe it was just me but i did not understand why and how it would be used :)
all in all a solid approach. But i think a few changes are to be made and some code to be reverted for effectiveness. |
getArenaCount duplicated GetTotalArenas. GetConfiguration previously returned a bool and there was some special handling for the case it couldn't find a configuration. We've removed the special handling so removing the bool return to make using it a bit less clunky.
Sorry I'm a bit of a novice on github PRs - I've pushed a new commit to the feature branch with the changes discussed in the comments. Do I close this PR and open a new one on that? ref: 5050eb3 |
No need. simply see the commits you made after the comments are reflected on this PR. You can also see the "OUTDATED" marker for the scripts you made changes to which are now overwritten by the newer commits you made recently. Edit: WHEN you think you're ready for a re-PR, simply click the recycling button next to my name at the top right so I know you're ready :). This button simply asks for a review again. |
however, I do think that the important change is still not made, specifically keeping the modularity of It's important to keep the main |
1 similar comment
I don't think I agree. Moving a big block of code into a function with a clear interface can be useful for code clarity, but I view breaking that code into several functions that must be called in sequence as a bit of an antipattern where we need to do it multiple times. i.e. if I wanted to achieve A, and that involved doing A1, then A2, then A3. I could do it by defining A (where A1, A2 and A3 are clearly commented in the definition of A), or I could do it by defining A1, A2 and A3 separately. The issue is if I do it in the latter way now every time I want to do A it takes me 3 function calls instead of 1, and I don't seem to get much benefit over clearly marking the distinct stages of A in the body of my A function with comments. The case where it might be attractive to do it anyway is if A1, A2, A3 are useful elsewhere, but even then I would be tempted to define A() = A1(); A2(); A3(); In this case A is UpdateActiveArenaToCurrentArenaID: given that the current arena ID has been updated, we want to do all the steps needed to achieve that. We need to do that in ResetArena() and LoadNextArena(). |
Hey Ben, here are some points I'd like you to consider in general and not in necessarily specific for your PR only. -Maintainability + readability: Breaking down a large function like UpdateActiveArenaToCurrentArenaID (renamed) into smaller, logically separated functions (A1, A2, A3) can significantly enhance readability and maintainability. Each function can be focused on a single responsibility, making the code easier to understand at a glance. When functions are smaller and well-named, new developers or others unfamiliar with the code can more easily understand the workflow and purpose of the code. We need to think about future developers and collaborators in this regard for AAI. -Reusability: Although in your scenario Ben A1, A2, and A3 might primarily support the process A, encapsulating these steps into separate functions can unexpectedly benefit future developments. For instance, if a new feature or requirement emerges that only needs the functionality of A2, it can be reused without duplicating code or introducing unnecessary dependencies. -Tests: Smaller, well-defined functions are easier to test (recall what i mentioned earlier). When functions are broken down into smaller parts, it's easier to write tests for each part, ensuring that they work correctly in isolation. This can lead to a more robust and reliable codebase, as each part can be individually verifiied.. -Error Handling and debugging: It can be easier to handle errors and debug smaller functions. When an error occurs, if the code is split into smaller parts, it's often quicker to isolate the problem. Also, you can handle errors specific to the part of the process each function represents, which might be more difficult in a larger, monolithic functio -Flexibility in refactoring/extending: Future changes to the process can be more manageable when the code is modular. For instance, if you later need to add a new step A4 between A2 and A3, doing so is straightforward without disturbing the other parts of the code. i think this can be more complex and error-prone in a single large function where changes affect the entire function's flow (which was the case before my PR). -Design principles (important for any SD): breaking down functions follows well-established software design principles like SRP SoC. These principles advocate for structuring software so that each element of the code has a single responsibility and is independent of the other parts, which enhances modularity and scalability, which you already know I assume :) but a reminder. So the point im trtying to make is that it's imperative that we understand these points and implement them where we can. Again, these points do not only apply to your PR, but in general for the codebase. I would have to kindly insist on this matter Ben. |
Hey Ibrahim, thank you for these general points. I don't agree that factoring into functions in this way is an improvement from a software engineering perspective, but it's not likely to have a big impact either way so I've pushed a change returning the functions. |
Thanks you can merge when you're ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good to go!
Adds the feature of multi-arena episodes, by looking for the mergeNextArena parameter in an arena's yaml. When a good (not timeout or death) episode end is reached TrainingAgent checks TrainingArena to see if it should end the episode or merge the next arena. If the latter it calls the new TrainingArena function loadNextArena.
Discarded Alternative: Just call ResetArena() since it does basically the same thing. This would be more complicated to implement in the situation where arenas are being randomly cycled. Also I think the semantics of ResetArena in AAI are already a bit complicated and I didn't want to further the problem.
Discarded alternative: Add a
completeArena
function to TrainingArena with a boolean for whether the episode was successful or not. Benefit: The TrainingAgent doesn't have to care about whether to call EndEpisode or rotate the arenas Drawback: Would have to pass control of showing notifications to TrainingArena, as TrainingAgent doesn't know whether the completion of an arena means a notification should be shown (episode end) or not (arena has been merged)Proposed change(s)
Adds multi arena episode support
Additional changes:
Useful links (Github issues, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Testing