-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting vectorized operations #6
Comments
I like the idea of supporting vectors out of the box, but I don't think units or containers should behave as positions. A unit is not a position, neither is a container. To make it clear that accessing this position is not zero cost (requires a sensor "call"), we might want to do something like Although if operators like |
Yeah, I think that mlog is so low-level assembly/like that there shouldn't be any expectation of 1:1 correspondence between Python code and underlying instruction count. The goal should be to have Python expressions match up with intuitive ways of thinking about the stuff, which often may (and should!) require numerous low-level instructions to implement. As for requiring .pos, I think it's pretty intuitive to ask for the distance between two units, or to order one unit to approach another, rather than pedantically insisting that the coder ask for the distance between their positions, and to command the one to approach the other's position. |
So, it looks to me like there are two main hurdles on the path to vectorized operations. (1) First, we need to detect which values need to be treated as vectors and which need to be treated as scalars. E.g. when we encounter the expression The two main options here are (1A) to use explicit cues from the coder, like using (2) The second hurdle is figuring out when to ask the game for .xy coordinates of an object (typically using a pair of sensor instructions) and when to trust that we already have up-to-date coordinates stored in relevant variables (e.g. |
It seems likely that we won't be able to do everything with (B) autodetection, so we'll need to (A) allow at least some explicit cues, both (1A) that an expression is a vector, and (2A) that we need to look up current values for it. So I'm tempted to implement this just with explicit cuing first, as a sort of proof-of-concept and useful working model, and then maybe add whatever layers of autodetection we feasibly can later. So that faces us with the question of what explicit cueing to use. I'm currently inclined (1A) to compile I'm currently inclined to use Under the hood, I think I'll make |
I've lost track on why We treat vectors as immutable, much like single values, meaning they can't be changed in-place, and every change to a vector produces a new value. Thus this: a.xy = vec(1, 2)
b.xy = a
a.x = 3
print(b.xy) …shows Basically Instead, the option of tracking types should be fine, but I think we should raise an error when the same variable is used as two different types if it occurs in any condition or loop (because the type "may or not change" and we can't know). Supporting the jumps from Python you suggest also makes type tracking at compile time impossible. |
It sounds like we're leaning in the same direction for now, of including lots of I guess I've used NumPy (and other operator-overloaded NumPy-like setups) enough that I'd rather not have to include all the Good point about flow-tracing becoming even trickier when arbitrary jump-destinations are allowed. Glad I'm not planning to do flow-tracing! I agree that, if we start doing much automated detection imposing a "once a vector, always a vector" rule is probably a good idea, including enforcing it with compiler errors. This helps with (task 1) determining what is a vector, but unfortunately doesn't help much with (task 2) determining when a vector needs to be updated from sensors. I don't think "immutable" is quite what you mean here. E.g., if |
Python has the luxury of doing all those operations on different types because the checks occur at runtime, but we need to do it at compile time. I think tracking the types would be worthwhile to save from all With regards to immutable, I mean the following. When you do Similarly, changing Thus the value immutable, meaning we don't need to track and update all references, and the snippet I described in #6 (comment) is valid. If vectors were mutable, it should've printed |
Stepping back from this, auto-detection is quite tricky if we continue doing everything in one pass, esp., if we continue to compile function definitions prior to seeing what will call them, so won't yet have any clue what the arguments will be. E.g., suppose we're compiling the following function: def halve(v): return v/2 If you look just at this function, there's no way of telling whether the argument will be vector or scalar, so no way to tell whether this should compile to a single division op or a pair of division ops. The only ways to answer that question are (A) to demand that the user somehow explicitly indicate which this is, e.g., by making the content Another wrinkle is that for macro-like inline functions, it could be perfectly sensible to make some inline instances vectorized and others not, depending upon each instance's arguments. E.g., Part of me is suspecting that there are two quite different approaches to this that we'll need to choose between: (A) require that every use of a vectorized op somehow be unambiguous on the first pass, often due to explicit type hints, esp. within function defs, so that we can correctly choose vectorized or unvectorized compilation on first pass, or (B) treat vectorized ops much as we now treat line numbers, and function bodies, by initially compiling them into some precursor that will be transformed into completed code in some later pass. Toying with an idea here: let's imagine that during the first pass, we compile all vectorized ops into their scalar version, so e.g. |
One wrinkle for automatic type-checking is values read from memory cells, which can be scalar or can be references to vector-like units/buildings. Other than this, I think it would be fairly straightforward to infer types from the operations that assign values to variables, assuming/enforcing that the same variable must have the same type in all occurrences, and that never-assigned variables must have been vector-like blocks like Still, I think automated type-tracking may be a bigger enterprise than I'm willing to take on right now (perhaps ever), so I'm leaning towards a useful enough, even if somewhat cluttered, approach where every element that will be vectorized should be explicitly marked as such, with I'm tempted to have the first pass just create single commands with lists in place of vector elements: e.g., |
IIRC you can only store numbers in memory cells, not strings, not references (it would be good if you confirmed this). |
Ah you're right. I thought I had done this, but what I'd actually done was store the @x and @y of a unit, rather than the unit itself. (I was trying to divide labor between a "watcher" processor whose job was to keep finding the player, and "follower" processors whose job was to keep units from straying too far from the player, so the watcher needed to signal the player location to the followers via memory block.) Anyway, that makes things a bit easier for type tracking. In fact, so long as we can assume (a) that each variable always has the same type, (b) that special @variables and constants are of known types, (c) that all other never-assigned variables are of type Block, (d) that out-of-line function arguments are always the type of any value passed to the argument, and (e) that the type of each instruction's output is fully determined by the type(s) of its inputs, then there will be no ambiguity at all about types, though discovering them may take repeated passes through the code. (E.g., in a convoluted program, it might not emerge until near the end of the first pass that I'll have to do set much of this machinery up regardless, just to derive types for temporary variables in complex expressions. Now now that I have a clear picture of how this machinery could be used to determine all types with no need for explicit type-hinting, that's making me lean again in the direction of doing this without any explicit (BTW, when I tested this, I found that writing a unit to a memory cell apparently stores 1 there, not very useful. You can print units though, which outputs their type (apparently a string).) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Here's the notation I've tentatively settled upon. For most ordinary purposes, you can refer to any object/vector by name, and it will behave as a vector in vectorized numerical operations (much as NumPy arrays do) and as an object in non-numerical operations. E.g., The following details will be irrelevant for most purposes, but occasionally could be useful, e.g. to shave off a few instructions. A "vector" is anything with x and y coordinates, including mindustry blocks, mindustry units, and "pure vectors" which have no in-game analog, but also store an x,y pair. Vectorized operations typically produce pure vectors as their output. For any mindustry object (block or unit) For any vector (block, unit, or pure vector), The suffix For objects (units/blocks), the suffix |
I had been assuming that encountering However, I'm now thinking that this may be too restrictive. E.g. it should be fine to have a The key things that typing is supposed to do for us are (1) letting us know which ops need to be vectorized, and (2) provide some helpful error messages when things won't work. As far as (1) is concerned, for numerical ops, all that matters is vector vs scalar, not which flavor of vector we have. I think there are only two places where the compiler really cares about pure vs object. One is Duck-typing makes entailment relations between atoms' classes be asymmetric. Encountering Thus far, I've been implementing implicit type hints as specifying the most general class that an atom must belong to. E.g. |
Stepping back a bit, the only proposed exception to the general rule that " This special case is easy to handle when I already know |
Mindustry is a 2D game, and much mindustry logic involves effective manipulation of 2D x,y vectors, where the same thing is done with both the x and y components. Python coders familiar with NumPy will find such redundant coding tedious and potentially error-prone, and would prefer vectorized operations, which combine these parallel operations together. E.g., if you want the midpoint between a unit and a target, it would be nice to be able to use something like
(unit.pos + target.pos) / 2
, rather than needing to compute .x and .y components separately. (Note: related to issue #2, it'd also be much better to be able to put this all on one line, rather than on multiple lines.)In pure Python (like NumPy), such operations are quite easy to support by overloading the relevant operators, like
+
and*
. Things will be quite a bit trickier though when compiling to mlog, as we won't be able to count on Python to automatically keep track of which variables are supposed to use overloaded operations and which ones aren't. Still I think it should be relatively straightforward to have the compiler track which variables are meant to track 2D vectors, and to compile them into paired lines of code (one for each of x,y). To a first approximation, this would require maintaining a set listing such variables, adding things to this set when created by a method that should produce a vector (like retrieving an objects .pos, or using a vectorized operation that outputs another vector), and then compiling later uses of such variables into appropriate de-vectorized pairs.Relatedly, it may make sense to allow units and blocks themselves to participate in such vectorized operations. So, e.g.,
dst(unit-player)
would be equivalent todst(unit.pos - player.pos)
which would itself be equivalent todst(unit.x - player.x, unit.y - player.y)
which of course would be equivalent to even more long-winded things using sensors to retrieve these attributes, and/or breaking this all apart into a long sequences of single-operation lines).The text was updated successfully, but these errors were encountered: