Broadcasting the results of two ak.zips #486
Replies: 6 comments
-
You're not going to like this answer, but "You can zip two zips unless they're unbroadcastable." (My assessment of the error is the same as the error message.) It's up to you to understand what's broadcastable and what's not. Your reproducer is not minimal; even if I managed to access the file on EOS, there's a lot of analysis code to pick through to find the actual error. And anyway, I think it would be more useful to look at this in a generic case. Consider two zips of a shallow array and two different deep arrays: >>> import awkward1 as ak
>>> shallow = ak.Array([1.1, 2.2, 3.3, 4.4, 5.5])
>>> deep1 = ak.Array([[1, 2, 3], [], [4, 5], [6, 7, 8], [9]])
>>> deep2 = ak.Array([["a", "b"], ["c"], ["d", "e", "f"], [], ["g"]])
>>> x = ak.zip({"x1": shallow, "x2": deep1})
>>> y = ak.zip({"y1": shallow, "y2": deep2}) The shallow array is like your event-level variables, "nMuLoose" etc., and the deep arrays are like your collections of particles. Be aware of what ak.zip is doing: it's duplicating each shallow value to match all of the corresponding deep values. To see that, let's print them out. Notice how the single >>> x.type
5 * var * {"x1": float64, "x2": int64}
>>> y.type
5 * var * {"y1": float64, "y2": string}
>>> x.tolist()
[[{'x1': 1.1, 'x2': 1}, {'x1': 1.1, 'x2': 2}, {'x1': 1.1, 'x2': 3}],
[],
[{'x1': 3.3, 'x2': 4}, {'x1': 3.3, 'x2': 5}],
[{'x1': 4.4, 'x2': 6}, {'x1': 4.4, 'x2': 7}, {'x1': 4.4, 'x2': 8}],
[{'x1': 5.5, 'x2': 9}]]
>>> y.tolist()
[[{'y1': 1.1, 'y2': 'a'}, {'y1': 1.1, 'y2': 'b'}],
[{'y1': 2.2, 'y2': 'c'}],
[{'y1': 3.3, 'y2': 'd'}, {'y1': 3.3, 'y2': 'e'}, {'y1': 3.3, 'y2': 'f'}],
[],
[{'y1': 5.5, 'y2': 'g'}]] Maybe you didn't want that. Maybe you wanted the event-level variables to stay separate from the particles by limiting how deeply they get zipped ( >>> maybe = ak.zip({"x1": shallow, "x2": deep1}, depth_limit=1)
>>> maybe.type
5 * {"x1": float64, "x2": var * int64}
>>> maybe.tolist()
[{'x1': 1.1, 'x2': [1, 2, 3]},
{'x1': 2.2, 'x2': []},
{'x1': 3.3, 'x2': [4, 5]},
{'x1': 4.4, 'x2': [6, 7, 8]},
{'x1': 5.5, 'x2': [9]}] In general, the results of a zip can be used in another zip, though it makes records of records. >>> zip_of_zip = ak.zip({"x": x, "deep1": deep1})
>>> zip_of_zip.type
5 * var * {"x": {"x1": float64, "x2": int64}, "deep1": int64}
>>> zip_of_zip.tolist()
[[{'x': {'x1': 1.1, 'x2': 1}, 'deep1': 1}, {'x': {'x1': 1.1, 'x2': 2}, 'deep1': 2}, {'x': {'x1': 1.1, 'x2': 3}, 'deep1': 3}],
[],
[{'x': {'x1': 3.3, 'x2': 4}, 'deep1': 4}, {'x': {'x1': 3.3, 'x2': 5}, 'deep1': 5}],
[{'x': {'x1': 4.4, 'x2': 6}, 'deep1': 6}, {'x': {'x1': 4.4, 'x2': 7}, 'deep1': 7}, {'x': {'x1': 4.4, 'x2': 8}, 'deep1': 8}],
[{'x': {'x1': 5.5, 'x2': 9}, 'deep1': 9}]] But the >>> ak.zip({"x": x, "y": y})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward1/operations/structure.py", line 348, in zip
out = awkward1._util.broadcast_and_apply(layouts, getfunction, behavior)
File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 975, in broadcast_and_apply
out = apply(broadcast_pack(inputs, isscalar), 0)
File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 745, in apply
outcontent = apply(nextinputs, depth + 1)
File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 786, in apply
nextinputs.append(x.broadcast_tooffsets64(offsets).content)
ValueError: in ListOffsetArray64, cannot broadcast nested list
(https://github.com/scikit-hep/awkward-1.0/blob/0.3.2/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27) This is not because they're zips of zips, but because the original >>> ak.zip({"deep1": deep1, "deep2": deep2})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward1/operations/structure.py", line 348, in zip
out = awkward1._util.broadcast_and_apply(layouts, getfunction, behavior)
File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 975, in broadcast_and_apply
out = apply(broadcast_pack(inputs, isscalar), 0)
File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 745, in apply
outcontent = apply(nextinputs, depth + 1)
File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 786, in apply
nextinputs.append(x.broadcast_tooffsets64(offsets).content)
ValueError: in ListOffsetArray64, cannot broadcast nested list
(https://github.com/scikit-hep/awkward-1.0/blob/0.3.2/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27) You can see that with ak.num, which tells you the number of elements in each list. If they differ, like the different numbers of electrons, muons, and jets in each event, then fields from one can't be put into fields of another because there are too many of some values and not enough of others. >>> ak.num(deep1), ak.num(deep2)
(<Array [3, 0, 2, 3, 1] type='5 * int64'>,
<Array [2, 1, 3, 0, 1] type='5 * int64'>)
>>> ak.num(x), ak.num(y)
(<Array [3, 0, 2, 3, 1] type='5 * int64'>,
<Array [2, 1, 3, 0, 1] type='5 * int64'>) But maybe you never meant to try to put them all in common lists. Maybe you only wanted to shallowly zip them: >>> maybe2 = ak.zip({"x": x, "y": y}, depth_limit=1)
>>> maybe2.type
5 * {"x": var * {"x1": float64, "x2": int64}, "y": var * {"y1": float64, "y2": string}}
>>> maybe2.tolist()
[{'x': [{'x1': 1.1, 'x2': 1}, {'x1': 1.1, 'x2': 2}, {'x1': 1.1, 'x2': 3}],
'y': [{'y1': 1.1, 'y2': 'a'}, {'y1': 1.1, 'y2': 'b'}]},
{'x': [],
'y': [{'y1': 2.2, 'y2': 'c'}]},
{'x': [{'x1': 3.3, 'x2': 4}, {'x1': 3.3, 'x2': 5}],
'y': [{'y1': 3.3, 'y2': 'd'}, {'y1': 3.3, 'y2': 'e'}, {'y1': 3.3, 'y2': 'f'}]},
{'x': [{'x1': 4.4, 'x2': 6}, {'x1': 4.4, 'x2': 7}, {'x1': 4.4, 'x2': 8}],
'y': []},
{'x': [{'x1': 5.5, 'x2': 9}],
'y': [{'y1': 5.5, 'y2': 'g'}]}] Or maybe you never meant to broadcast the event-level variables ("shallow" things like "nMuLoose") in the first zip, anyway. Basically, you need to be aware of the data structures you're making, the difference between zipping "all the way down" to the particle attributes, duplicating any event-level variables to match, and zipping down to some depth with It's much easier to get this awareness if you develop your analysis interactively on a Python prompt, in IPython, or in Jupyter. As you see above, it was instructive to print out the types of each object and the first few samples with By the end, you may want everything wrapped up in a script that you can run in a push-button way, not a notebook that you have to engage interactively. For that, I usually experiment in an interactive terminal/IPython/Jupyter and copy each bit into a script as soon as it works. The notebooks are not final products, but an essential step. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot Jim for quick and detailed reply. I was in the mean time following this documentation [1] again and seems I did not understood the zipping properly. I am reading your reply and following the tutorial and check what suits in my case. Thanks again for advise to use the jupyter, I will try that for quick debugging, indeed I was so far using [:10] to see the print of a few events. Ofcourse jupyter will will help in fasten the debugging,. Regards. |
Beta Was this translation helpful? Give feedback.
-
For my part, I need to translate explanations like the above into formal documentation on https://awkward-array.org. The trouble is that everybody needs to know something different: in your case it was the |
Beta Was this translation helpful? Give feedback.
-
Indeed, It was issue of depth_limit. I was aware of depth_limit and did it right at first step and then forgot it because I was using awkward1 for the first time. It was so silly to spend to much time debugging myself and then yours. What should i use next time to ask some question if I don't find it in the documentation? |
Beta Was this translation helpful? Give feedback.
-
For a "how do I?" question, the best place is StackOverflow with the |
Beta Was this translation helpful? Give feedback.
-
I understand, I will use stackoverflow in future. |
Beta Was this translation helpful? Give feedback.
-
Dear Experts
I try to make a ak.zip of two existing zips and ran into broadcasting error.
Looking at the structure/dimension of these two input zips everything seems consistent and there should be no issue while creating these zips.
I attached my code to this page for reference.
Do you have some suggestions to debug this or fix this?
https://github.com/ramankhurana/test/blob/main/test.py
Error:
Traceback (most recent call last):
File "test.py", line 87, in
print ("ele_mu", ak.zip ({ "ele":ele_, "mu":mu_ }))
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/operations/structure.py", line 348, in zip
out = awkward1._util.broadcast_and_apply(layouts, getfunction, behavior)
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 972, in broadcast_and_apply
out = apply(broadcast_pack(inputs, isscalar), 0)
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 745, in apply
outcontent = apply(nextinputs, depth + 1)
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 786, in apply
nextinputs.append(x.broadcast_tooffsets64(offsets).content)
ValueError: in ListOffsetArray64, cannot broadcast nested list
(https://github.com/scikit-hep/awkward-1.0/blob/0.3.1/src/cpu-kernels/operations.cpp#L778)
Beta Was this translation helpful? Give feedback.
All reactions