Test design (CP1)

In this document, we briefly share some strategies, guidelines, and heuristics for designing representative tests that exercise adaptations for preserving the intents in CP1 challenge problem. More specifically, we provide explanations for the selection of parameters, with knowledge of any interdependence on each, and the selection of their values for constructing representative tests.

Test criteria

The tests which are generated by TH should ideally have the following criteria:

Dynamic: meaning that TH should exploit runtime information (e.g., the position of the robot, the direction it's heading, it's speed, etc) as well as static data for producing interesting tests.
Infer: In order to have a reusable and independent harness, TH should infer the information based on runtime status of the robot, like a human tester of a robot that observes where the robot is going and push the robot, etc. This make TH an independent entity with respect to the system under test.
Customizable: The test harness should provide configuration options in order to allow designing customizable tests.
Fairness: Dynamic perturbations produced by TH should be fair across baselines (i.e., the number and position of obstacle placements, battery perturbations and their values should be comparable across baselines), otherwise the comparisons between baselines and therefore test outcomes become unreliable.
Coverage: The tests should vary in terms of complexity from each other, for example, two tests which are similar in terms of mission specification and perturbations are considered to be the same and they hit the same area in the test space, so it is waste of time (see details below regarding complexity).
Validity: The perturbations should be valid, i.e., not causing the planner not comming up with any possible plan, example of invalid perturbations: obstructing robot by putting obstacles behind and in front of the robot, or putting obstacles on waypoints.

Test cases (Specification)

To evaluate intent discovery, we propose that a set of test cases, each describing a mission as well as perturbations for the robot (e.g., navigating a simulated corridor, placing an obstacle and changing the battery level once). Each test case is described by the following:

Mission schema: Navigation
Mission parameters: A->T1->T2->...->Tn (the way points or tasks that the robot need to accomplish provided by LL)
Discharge functions to be selected from a set of 100 predefined models with different difficulty level (each with different number of option interactions).
Dynamic perturbations: Obstacles + Battery level change
Possible adaptations: possible variations for path planning, robot reconfiguration, charge battery
Evaluation metrics: Number of tasks accomplished (Primary), Mission accomplish time (Secondary)

Test cases (Coverage)

Here are a list of options and heuristics for selecting their values for determining a representative collection of test cases:

In the mission specification, different number of tasks (determined by the number of waypoints) should be selected (representing mission complexity). The tasks specify the distance that the robot will traverse and, therefore, represent operational environment. Note that the operational environments captures different variety of likelihood and the consequences of a failure. In other words, TH must exercise different number of tasks associated to the mission specifications of the tests. Note that we fixed the [map](all the static data here for this challenge problem, but given the 10 waypoints (l1-l10) in the map and given the fact that these waypoints can be repeated in the mission specification, TH can technically select as many as possible tasks. However, more tasks means longer time for finishing the mission, therefore, the recommended number of tasks are as follows: [1-10].
Discharge functions (representing power models of the simulated battery) comes with different complexity. Recommended values: [0-99].
Budget can be varied in [2-1048576]. When varying the budget, we recommend using values from different orders of magnitude comparing with other tests. For example, it is a waste of time to use 2 and 3 as budget in different tests, however, if 2 was used in a previous test, 10 or 100 are advised to be used in other tests.
Obstacle can be placed and removed at anytime. The obstacles should be placed in a corridor that is likely to be met by the robot. This can be easily inferred by the direction where the robot is going. Note that at anytime in the map, there is only two alternative path to a target waypoint. Therefore, it is representative if the obstacle is placed in a corridor that is 50% likely it will be met by the robot. Although it is possible to put as many as possible obstacles, there should be a way that the robot can reach to a target waypoint by going through an alternative path. So, for example, the obstacles should not be placed in front and behind the robot at the same time in a way that there would not be any possible path to a target location, or the obstacles should not placed at the waypoints as the robot should get to the target waypoints i.e., the obstacles should not trap the robot, note that robots cannot displace obstacles in this challenge problem and if the obstacles are placed incorrectly this causes the robot to not come up with any valid plan and stays in one location without any activity. We recommend of putting 1 obstacle at a time and if needed removing the current obstacle and placing it in another location to exercise the adaptations, again this should be in a way that TH excludes certain points such as locations of the waypoints in the map. It is also possible to decide the positions where the obstacles are put based on mission specification at design time. Note that the number of obstacle placement is also a determinant of the complexity of the test.
Similarly to obstacle placement, the number of battery set as well as how low it is set also determine the complexity of the tests. However, the battery should be set to a value that is possible for the robot reaching to a charging station (with a conservative configuration). We advice to set the battery level in [3,255-32,559] with a multiplication of 3,255 (which is 1% of the battery capacity).
Obstacle placement and battery set are dynamic perturbations, so TH should guarantee fairness across baselines. Note that a combinations of these perturbations can be used in different tests.

Note that any two test cases would be different if the difficulty levels of them are different. However, if two test cases are similar with respect to the difficulty of the test, we consider them identical.

Discussions

We have discussed some test strategies and ideas in the following threads:

Fairness.

Map

Instructions for running tests

Launch CP1 without TH (change docker-compose-no-th.yml accordingly with the required baseline params):

TH_PORT=8081 TA_PORT=8080 docker-compose -f docker-compose-no-th.yml up

Launch CP1 with TH (change docker-compose-mitll-harness.yml accordingly with the required baseline params):

TA_PORT=5000 TH_PORT=5001 docker-compose -f docker-compose-mitll-harness.yml up

Get status:

curl -X GET "http://0.0.0.0:8080/observe" -H "accept: application/json"

Perturb (place obstacle):

curl -X POST "http://0.0.0.0:8080/place-obstacle" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"x\": 0, \"y\": 0}"

Perturb (remove obstacle):

curl -X POST "http://0.0.0.0:8080/remove-obstacle" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"obstacleid\": \"string\"}"

Perturb (battery set):

curl -X POST "http://0.0.0.0:8080/battery" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"charge\": 0}"

Start mission:

curl -X POST "http://0.0.0.0:8080/start" -H "accept: application/json"

Note that the mission specification is in the compose file, without a TH is specified in a ready message in a json file, while with TH, the mission is specified in run.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test-design-cp1.md

test-design-cp1.md

Test design (CP1)

Test criteria

Test cases (Specification)

Test cases (Coverage)

Discussions

Map

Instructions for running tests

Files

test-design-cp1.md

Latest commit

History

test-design-cp1.md

File metadata and controls

Test design (CP1)

Test criteria

Test cases (Specification)

Test cases (Coverage)

Discussions

Map

Instructions for running tests