-
I would like to follow up on a discussion on gitter a while ago. Since then I have understood a bit better what I actually need and, while I'm still only scratching the surface, understood a bit better how awkward arrays work. What I need is a function like this: def dim_len(array, dim):
"""
Get the length of the array in the given dimension.
Returns a non-negative integer if the array is regular in the given dimension and None otherwise.
Examples
--------
>>> dim_len(ak.Array(np.array([[1,2,3],[3,4,5]])), 1)
3
>>> dim_len(ak.Array([[1,2,3],[3,4,5]])), 1)
None
>>> dim_len(ak.Array([{'a': 1}, {'b': 1}]), 0)
2
"""
raise NotImplementedError In the chat on gitter, it was suggested to recurse through the array types. Is this the way to go? It feels a bit weird that no such function is available as part of the library, I would have expected this to be a rather common use-case. If using the array types is the way to go, could I please get your feedback on if I understood correctly the implications of each type:
CC @ivirshup |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
Your table is mostly correct. Whilst I was thinking about this, I noticed that I initially had some confusion as to how types compare with layouts. The notable difference between types and layouts is that in the type system, Meanwhile, the list-types, e.g. I wrote out an example of import awkward._v2 as ak
import typing as t
SizeType = t.Union[int, None]
def _dim_size_next(node: ak.types.Type, depth: int) -> SizeType:
"""Walk into the contents of the given node.
:param node: type node
:param depth: depth above the intended type node
:returns: size of type, or None if irregular
"""
# Do we have a container?
if isinstance(node, (ak.types.ListType, ak.types.RegularType, ak.types.ArrayType)):
return _dim_size_impl(node.content, depth - 1)
else:
raise TypeError(f"Unexpected node type: {node!r}")
def _dim_size_at(node: ak.types.Type) -> SizeType:
"""Compute the size of the given type node
:param node: type node
:returns: size of type, or None if irregular
"""
if isinstance(node, ak.types.RegularType):
return node.size
elif isinstance(node, ak.types.ListType):
return None
elif isinstance(node, ak.types.ArrayType):
return node.length
else:
raise TypeError(f"Unexpected node type: {node!r}")
def _dim_size_transform(node: ak.types.Type) -> ak.types.Type:
"""Transform the given type node to ignore superfluous types
:param node: type node
:returns: important type
"""
# These types are "atoms", and do not correspond to a dimension.
# As atoms, they have no length, so if we see them, it's an error.
if isinstance(node, (ak.types.NumpyType, ak.types.UnknownType)):
raise ValueError(f"Cannot compute size of atom: {node!r}")
# Meanwhile, unions *could* have an interpetable size, but we choose
# not to handle them. Records are never considered to have a size
elif isinstance(node, (ak.types.UnionType, ak.types.RecordType)):
raise TypeError(f"Cannot compute size through branching types: {node!r}")
# Whether recursing through, or looking at this dimension
# We don't care about options, so move through them
if isinstance(node, ak.types.OptionType):
return _dim_size_transform(node.content)
else:
return node
def _dim_size_impl(node: ak.types.Type, depth: int) -> SizeType:
"""Dispatcher to compute the size of the given type node
:param node: type node
:param depth: depth above the intended type node
:returns: size of type, or None if irregular
"""
# Depth-agnostic transforms
node = _dim_size_transform(node)
# Now consider whether we're recursing or at the appropriate depth
if depth > 0:
return _dim_size_next(node, depth)
else:
return _dim_size_at(node)
def dim_size(array: ak.Array, dim: int) -> SizeType:
"""Compute the size of a particular dimension of the given array
:param array: Awkward Array
:param dim: dimension of which to compute the size
:returns: size of dimension, or None if irregular
"""
return _dim_size_impl(array.type, depth=dim) Awkward Array operations generally stop at records, so it doesn't make sense to compute the size of these types. However, it is technically possible to compute the size of unions; you just need to resolve the branches at the end. I opted to just make this an error, but you could extend it. Note that this doesn't handle negative axis (dim) values. You could do this by first computing the depth of the array, e.g. via You might use this like: import numpy as np
arr = ak.Array([
[
[
[1, 2, 3],
[4, 5, 6, 7]
],
[
[1, 2, 3],
[4, 5, 6, 7]
]
]
])
arr = ak.to_regular(arr, axis=2)
print(dim_size(arr, 0))
print(dim_size(arr, 1))
print(dim_size(arr, 2))
print(dim_size(arr, 3)) |
Beta Was this translation helpful? Give feedback.
-
As usual, @agoose77 beat me to an answer again! (Sarcasm: I don't mind!) I was writing an answer that uses the new Here's what I came up with, and most of the complication is dealing with the fact that union types introduce branches. def dim_len(array, axis):
if axis < 0: # negative axis is another can of worms... maybe later
raise NotImplementedError
elif axis == 1:
return len(array)
else:
def size_at_depth(layout, depth, lateral_context, **kwargs):
if layout.is_NumpyType:
# if it's an embedded rectilinear array, we have to deal with its shape
# which might not be 1-dimensional
if layout.is_UnknownType:
shape = (0,)
else:
shape = layout.shape
numpy_axis = lateral_context["axis"] - depth + 1
if not (1 <= numpy_axis < len(shape)):
raise TypeError(f"axis={lateral_context['axis']} is too deep")
lateral_context["out"] = shape[numpy_axis]
return layout.nplike.empty(1)
elif layout.is_ListType and depth == lateral_context["axis"]:
if layout.is_RegularType:
# if it's a regular list, you want the size
lateral_context["out"] = layout.size
else:
# if it's an irregular list, you want a null token
lateral_context["out"] = -1
return layout.nplike.empty(1)
elif layout.is_RecordType:
# if it's a record, you want to stop descent with an error
raise TypeError(f"axis={lateral_context['axis']} is too deep, reaches record")
elif layout.is_UnionType:
# if it's a union, you could get the result of each union branch
# separately and see if they're all the same; if not, it's an error
result = None
for content in layout.contents:
context = {"axis": lateral_context["axis"]}
ak.transform(size_at_depth, content, lateral_context=context, return_array=False)
if result is None:
result = context["out"]
elif result != context["out"]:
raise TypeError(f"union results in different values at axis={lateral_context['axis']}")
lateral_context["out"] = result
return layout.nplike.empty(1)
# communicate with the recursive function using a context (lateral)
context = {"axis": axis}
# "transform" but we don't care what kind of array it returns
ak.transform(size_at_depth, array, lateral_context=context, return_array=False)
# you wanted the null token to be None
return None if context["out"] == -1 else context["out"] The primary purpose of We care about NumpyType arrays that are the leaves of a tree, as they might have their own rectilinear We care about ListType arrays because that's where the answer will usually come from. If it's regular, you get the We care about RecordType because you want the recursion to stop there (which it wouldn't normally do). We care about UnionType because you either want to stop there, as in @agoose77's solution, or you want to unify the results of each branch, which is possible. This is why I made the null token temporarily Stepping back, it's a fairly complex function (a page), but it works for every possible Awkward Array. Edit: Oh, I forgot to mention that we didn't have to mention option-types at any point, because they're irrelevant for this problem. |
Beta Was this translation helpful? Give feedback.
As usual, @agoose77 beat me to an answer again! (Sarcasm: I don't mind!)
I was writing an answer that uses the new
ak.transform
(PR #1610), which I think was inspired, or at least got bumped up in priority by your problem. (It fulfills an old issue #516.) The goal of this interface was to provide a public API for the function we use internally to define functions like these, which would streamline the process of absorbing it into the Awkward codebase, once we have a good idea of what the general API should look like.Here's what I came up with, and most of the complication is dealing with the fact that union types introduce branches.