-
Notifications
You must be signed in to change notification settings - Fork 0
Problems concerning training data generation
So far the biggest challenge is to gather appropriate training examples. Obviously the disadvantages of manually composing data is a tedious effort. Changing inputs or outputs requires usually to compose or generate completely new data. Most likely it's the desire to have some automated training scenario, which covers the data generation. The BWAPI already grants possibilities for unsupervised execution of StarCraft bots. Getting into detail, the BWAPI lacks here and there concerning precise and valuable information. This wiki page roughly describes the problems faced while implementing the approach to train individual marines in combat situations.
These inputs and outputs are implemented:
-
Hit points of the individual unit
-
Hit points of all friendly units
-
Number of friendly units
-
Closest enemy distance
-
Number of enemy units
-
Hit points of all enemy units
-
Attack weakest (dynamic duration based on attack animation)
-
Attack closest (dynamic duration based on attack animation)
-
Run Away (duration: 7 frames)
The map is composed of 10 friendly and 10 enemy marines. A fitness function is employed which determines how well or how bad the new situation turned out to be. This is done by comparing the inputs before and after the taken action. In the end that information makes up one training example which is trained right after the match concludes.
Individual units are focused. In the case of one unit just running away, whereas all the other fellow units deal lots of damage, the unit running away would consider running as good. The outcome of the enemy was bad and for the friendly units good. So the performance of individual units gets completely blurred, resulting in conflicting and unreliable training data. A 1 vs 1 scenario would ultimately mean, that most inputs and outputs are not of interest. Having one enemy makes the decision for attacking the closest or weakest unit irrelevant.
The approach for the fitness function is only applicable for attack actions. During moving around, the unit can't deal damage, but it can retrieve damage. After all the situation stays the same or gets worse. So far I couldn't think of an approach which could give meaning to movement actions. It might be of interest to wait for a certain amount of time before evaluating the movement action.
Frame-perfect attack actions can't be implemented. The BWAPI lacks in providing information about attacks. It only provides information about a unit being under attack, is attacking and is executing an attack animation. In this context frame-perfect means that a unit can alter it's action without interrupting the essential part of an attack action. For example, an attack is successful as soon as the strike hit the enemy or a projectile is fired. Right after these moments a command for moving could be issued without interrupting the previous action. Waiting for the animation of the attack to end wastes several frames, which could be used for other actions like moving for a kiting strategy. So the BWAPI doesn't tell about the point of time when a projectile is launched and the impact to the enemy can't be related to the cause. The attacker could inform it's target about an incoming attack, but the target can't tell for sure if the impact was done by the attacker. There are many other attackers who could be blamed for the impact. The distance and orientation matters a lot and influences the duration of the attack animation. It is not possible to reliably predict the launch of a projectile and its impact. This leads to mismatching training data.