-
I have some problems understanding how to "flatten" nested arrays/records when applying masks. I think it's best explained with an actual use-case (with reduced complexity). Given a set of events (in this case 3), each event contains a variable lengths of particle tracks where I want to pick exactly one with the highest likelihood ( from skhep_testdata import data_path
import uproot
import awkward as ak
f = uproot.open(data_path("uproot-issue431b.root"))
# grab the tracks of the first three events and only the lik-parameter for simplicity
tracks = f["E/Evt/trks"].arrays(["lik"], aliases={"lik": "trks.lik"})[:3]
>>> tracks.tolist() # truncated output
[{'lik': [294.64, 294.64, ..., 67.77, 67.77]},
{'lik': [96.75, 96.75, ..., 39.18, 38.87]},
{'lik': [560.27, 560.27, ..., 118.72, 117.80]}] I create an integer- mask = ak.argmax(tracks.lik, axis=-1, keepdims=True)
# <Array [[0], [0], [0]] type='3 * var * ?int64'> and use that mask to actually select the tracks: selected_tracks = tracks[mask]
# <Array [{lik: [295]}, ... 96.8]}, {lik: [560]}] type='3 * {"lik": var * ?float64}'> As seen above, the masking works fine but it is nested. This is expected since the mask with selected_tracks.lik
# <Array [[295], [96.8], [560]] type='3 * var * ?float64'>
ak.flatten(selected_tracks.lik) # this is the desired output but there are dozens of fields
# <Array [295, 96.8, 560] type='3 * ?float64'>
# or when accessing single entries:
selected_tracks[0].lik
# <Array [295] type='1 * ?float64'>
selected_tracks[0].lik[0]
# 295 Is there a better approach to reduce the dimensions of the fields of nested records, so that I am pretty sure it's kind of an corner-case but unfortunately many of our files are structured like this and I am trying to wrap all this into a low-level user-library which needs to be high-performant and user-friendly at the same time. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
It's not such a corner case—other users might not encounter exactly this case, but probably a similar-enough one. Since you know each list returned by >>> tracks[mask].tolist()
[{'lik': [294.6407542676734]}, {'lik': [96.75133289411137]}, {'lik': [560.2775306614813]}]
>>> tracks[mask][:, 0].tolist()
[{'lik': 294.6407542676734}, {'lik': 96.75133289411137}, {'lik': 560.2775306614813}] I did a quick check to verify that this also works with empty lists (missing values in the >>> fewer_tracks = tracks[tracks.lik > 100]
>>> ak.num(fewer_tracks)
<Array [{lik: 20}, {lik: 0}, {lik: 55}] type='3 * {"lik": int64}'>
>>> mask = ak.argmax(fewer_tracks.lik, axis=1, keepdims=True)
>>> mask
<Array [[0], [None], [0]] type='3 * var * ?int64'>
>>> fewer_tracks[mask].tolist()
[{'lik': [294.6407542676734]}, {'lik': [None]}, {'lik': [560.2775306614813]}]
>>> fewer_tracks[mask][:, 0].tolist()
[{'lik': 294.6407542676734}, {'lik': None}, {'lik': 560.2775306614813}] What the >>> deep = ak.Array([[{"x": {"a": 1}, "y": [{"b": 2}]}]])
>>> deep.type
1 * var * {"x": {"a": int64}, "y": var * {"b": int64}}
>>> deep[:, 0].tolist()
[{'x': {'a': 1}, 'y': [{'b': 2}]}]
>>> deep[0, 0].tolist()
{'x': {'a': 1}, 'y': [{'b': 2}]}
>>> deep[0, 0, "x"].tolist()
{'a': 1}
>>> deep[0, "x", 0].tolist()
{'a': 1}
>>> deep["x", 0, 0].tolist()
{'a': 1}
>>> deep[0, 0, "y"].tolist()
[{'b': 2}]
>>> deep[0, 0, "y", 0].tolist()
{'b': 2}
>>> deep[0, "y", 0, 0].tolist()
{'b': 2}
>>> deep["y", 0, 0, 0].tolist()
{'b': 2} To be fully correct, this is Commutivity with a Caveat (I wish I had a better word): column-string selections can move anywhere to the left (shallower) than a deepest position, but not to the right, as illustrated by this failure: >>> deep[0, 0, 0, "y"].tolist()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward/highlevel.py", line 943, in __getitem__
return ak._util.wrap(self._layout[where], self._behavior)
ValueError: in NumpyArray, too many dimensions in slice because the first |
Beta Was this translation helpful? Give feedback.
-
Ah dear, of course I know the Yes, this all make sense, I can wrap the Btw. it's a bit funny since I am pretty sure that this behaviour was a bit different in past. At least I have some unit-tests which at some point failed but I cannot put together the exact uproot/awkward versions anymore, so I have no idea where the slicing behaviour changed, but it was at some point returning single elements. I know that it was a combination of uproot3 and an earlier awkward1 version, probably around |
Beta Was this translation helpful? Give feedback.
-
This is the first job which failed: https://git.km3net.de/km3py/km3io/-/jobs/104059 with
And the last which worked https://git.km3net.de/km3py/km3io/-/jobs/101157 with
So somewhere between these things have suddenly failed, which were using
|
Beta Was this translation helpful? Give feedback.
It's not such a corner case—other users might not encounter exactly this case, but probably a similar-enough one.
Since you know each list returned by
ak.argmax
withkeepdims=True
has exactly one element (even if that element is None), you can use[:, 0]
as a slice to remove the dimension by picking the first (and only) element:I did a quick check to verify that this also works with empty lists (missing values in the
mask
).