Replies: 1 comment
-
So, onto discussion. My opinions:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
TL;DR
ak.from_regular
calls to ensure the index is jagged.Motivation
NumPy Compatability
Awkward Array supports both
ak.Array
s andnp.ndarray
s in its high-level API, and in many of the array/layout methods. To maintain compatibility with NumPy, parts of the Awkward API that overload the NumPy API (e.g.ak.sum
) behave differently according to whether they are given "NumPy-like" (regular) arrays or jagged arrays. There are several places that this happens:ak.Array.__getitem__
)ak.broadcast_arrays
)Indexing mechanisms (
array[...]
subscript operator)ak.Array.__getitem__
supports both the NumPy advanced indexing mechanism, and an Awkward-specific one. Although not strictly part of the NumPy API (It isn't a ufunc, or overloaded by__array_function__
), the behaviour of the array subscript operator is a constrained by the social contract thatArray
s behave likendarray
s. So, to support both kinds of indexing, we switch between the implementations according to the kind of array that is passed into the index, i.e. whether it is entirely jagged, or entirely regular.Awkward-like Indexing
vs NumPy-like Indexing
I will subsequently refer to indexing with jagged (
var
dimension) arrays as "awkward-like indexing", whilst indexing with regular arrays is "NumPy-like indexing".Whilst it is reasonable for Awkward
Array
s to behave differently to NumPyndarray
s, a regular Awkward (non-var
)Array
is conceptually a (stricter) sub-type of a jagged Awkward (var
)Array
. By a hand-wavy application of Liskov's substitution principle, it should be possible to substitute the former for the latter. However, because we use jaggedness to request Awkward-like semantics, code that interacts with regular arrays nearly always does not work for irregular ones.Intent vs Structure
Part of the problem here is that we mix the intent of the operation (e.g. perform advanced Awkward indexing) with the structure (e.g. this array has a dimension of fixed size). Whilst this satisfies some notion of compatibility (indexing with NumPy-like arrays should behave like NumPy), it makes it harder to reason about what code will do without actually running it.
For example, consider using
argmax
of one array to index another. Here is the first code one might write:Now, from the rules of indexing, we cannot know if this will succeed for any arrays
x
andy
, because ifx
is regular, thenkeepdims=True
will giveix
a constant* 1
dimension. Ifx
is jagged in the final dimension,ix
will be given* var
in the last dimension. Not only that, we need to be careful that the leading dimensions ofx
are all jagged, because otherwise the index will still fail. Thus, the "safe" code becomesThis is a bit of a contrived example, but it reflects the fact that just by wanting to index using Awkward-like semantics, we have to add boilerplate to assert our intent (ahead of time).
Proposals
The take-away of the above section is that I would like to separate the different kinds of advanced indexing such that there cannot be any ambiguity. In an ideal world, we wouldn't change the meaning of an operation according to the array dimension types at all, but as aforementioned, we do this for compatibility.
Making a change to solve this will allow the reader to reason about the code, and remove the need for so many guards. @jpivarski and I have had a few conversations on this already, and the following ideas were discussed. For completeness, I include ideas that aren't really strongly supported!
ak.Array.var
that "only" performs Awkward-like indexing:.loc
in Pandasak.Array.__getitem__
?y = ak.with_awkward_semantics(y)
such thaty[ix]
always performs Awkward-like indexing. This would propagate through most operations (but what happens withak.zip
?).var
in (1)!y[ak.jagged[ix]]
ory[ix.jagged]
Pinging @nsmith whom @jpivarski mentioned had thoughts on an
ak.Array.pick
-like function in the pastBeta Was this translation helpful? Give feedback.
All reactions