Broadcasting the results of two ak.zips #486

ramankhurana · 2020-10-08T09:29:54Z

ramankhurana
Oct 8, 2020

Dear Experts

I try to make a ak.zip of two existing zips and ran into broadcasting error.

Looking at the structure/dimension of these two input zips everything seems consistent and there should be no issue while creating these zips.

I attached my code to this page for reference.

Do you have some suggestions to debug this or fix this?

https://github.com/ramankhurana/test/blob/main/test.py

Error:

Traceback (most recent call last):
File "test.py", line 87, in
print ("ele_mu", ak.zip ({ "ele":ele_, "mu":mu_ }))
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/operations/structure.py", line 348, in zip
out = awkward1._util.broadcast_and_apply(layouts, getfunction, behavior)
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 972, in broadcast_and_apply
out = apply(broadcast_pack(inputs, isscalar), 0)
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 745, in apply
outcontent = apply(nextinputs, depth + 1)
File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 786, in apply
nextinputs.append(x.broadcast_tooffsets64(offsets).content)
ValueError: in ListOffsetArray64, cannot broadcast nested list
(https://github.com/scikit-hep/awkward-1.0/blob/0.3.1/src/cpu-kernels/operations.cpp#L778)

jpivarski · 2020-10-08T13:24:00Z

jpivarski
Oct 8, 2020
Maintainer

You're not going to like this answer, but "You can zip two zips unless they're unbroadcastable." (My assessment of the error is the same as the error message.) It's up to you to understand what's broadcastable and what's not.

Your reproducer is not minimal; even if I managed to access the file on EOS, there's a lot of analysis code to pick through to find the actual error. And anyway, I think it would be more useful to look at this in a generic case. Consider two zips of a shallow array and two different deep arrays:

>>> import awkward1 as ak
>>> shallow = ak.Array([1.1, 2.2, 3.3, 4.4, 5.5])
>>> deep1 = ak.Array([[1, 2, 3], [], [4, 5], [6, 7, 8], [9]])
>>> deep2 = ak.Array([["a", "b"], ["c"], ["d", "e", "f"], [], ["g"]])
>>> x = ak.zip({"x1": shallow, "x2": deep1})
>>> y = ak.zip({"y1": shallow, "y2": deep2})

The shallow array is like your event-level variables, "nMuLoose" etc., and the deep arrays are like your collections of particles. Be aware of what ak.zip is doing: it's duplicating each shallow value to match all of the corresponding deep values. To see that, let's print them out. Notice how the single 1.1 appears in three records of x and two records of y. Also notice that if you have any empty lists, the corresponding shallow values are lost entirely.

>>> x.type
5 * var * {"x1": float64, "x2": int64}
>>> y.type
5 * var * {"y1": float64, "y2": string}
>>> x.tolist()
[[{'x1': 1.1, 'x2': 1}, {'x1': 1.1, 'x2': 2}, {'x1': 1.1, 'x2': 3}],
 [],
 [{'x1': 3.3, 'x2': 4}, {'x1': 3.3, 'x2': 5}],
 [{'x1': 4.4, 'x2': 6}, {'x1': 4.4, 'x2': 7}, {'x1': 4.4, 'x2': 8}],
 [{'x1': 5.5, 'x2': 9}]]
>>> y.tolist()
[[{'y1': 1.1, 'y2': 'a'}, {'y1': 1.1, 'y2': 'b'}],
 [{'y1': 2.2, 'y2': 'c'}],
 [{'y1': 3.3, 'y2': 'd'}, {'y1': 3.3, 'y2': 'e'}, {'y1': 3.3, 'y2': 'f'}],
 [],
 [{'y1': 5.5, 'y2': 'g'}]]

Maybe you didn't want that. Maybe you wanted the event-level variables to stay separate from the particles by limiting how deeply they get zipped (depth_limit):

>>> maybe = ak.zip({"x1": shallow, "x2": deep1}, depth_limit=1)
>>> maybe.type
5 * {"x1": float64, "x2": var * int64}
>>> maybe.tolist()
[{'x1': 1.1, 'x2': [1, 2, 3]},
 {'x1': 2.2, 'x2': []},
 {'x1': 3.3, 'x2': [4, 5]},
 {'x1': 4.4, 'x2': [6, 7, 8]},
 {'x1': 5.5, 'x2': [9]}]

In general, the results of a zip can be used in another zip, though it makes records of records.

>>> zip_of_zip = ak.zip({"x": x, "deep1": deep1})
>>> zip_of_zip.type
5 * var * {"x": {"x1": float64, "x2": int64}, "deep1": int64}
>>> zip_of_zip.tolist()
[[{'x': {'x1': 1.1, 'x2': 1}, 'deep1': 1}, {'x': {'x1': 1.1, 'x2': 2}, 'deep1': 2}, {'x': {'x1': 1.1, 'x2': 3}, 'deep1': 3}],
 [],
 [{'x': {'x1': 3.3, 'x2': 4}, 'deep1': 4}, {'x': {'x1': 3.3, 'x2': 5}, 'deep1': 5}],
 [{'x': {'x1': 4.4, 'x2': 6}, 'deep1': 6}, {'x': {'x1': 4.4, 'x2': 7}, 'deep1': 7}, {'x': {'x1': 4.4, 'x2': 8}, 'deep1': 8}],
 [{'x': {'x1': 5.5, 'x2': 9}, 'deep1': 9}]]

But the x and the y that we've just created can't be zipped because they can't be broadcasted:

>>> ak.zip({"x": x, "y": y})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/operations/structure.py", line 348, in zip
    out = awkward1._util.broadcast_and_apply(layouts, getfunction, behavior)
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 975, in broadcast_and_apply
    out = apply(broadcast_pack(inputs, isscalar), 0)
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 745, in apply
    outcontent = apply(nextinputs, depth + 1)
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 786, in apply
    nextinputs.append(x.broadcast_tooffsets64(offsets).content)
ValueError: in ListOffsetArray64, cannot broadcast nested list

(https://github.com/scikit-hep/awkward-1.0/blob/0.3.2/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)

This is not because they're zips of zips, but because the original deep1 and deep2 are unbroadcastable due to having different numbers of elements in their lists.

>>> ak.zip({"deep1": deep1, "deep2": deep2})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/operations/structure.py", line 348, in zip
    out = awkward1._util.broadcast_and_apply(layouts, getfunction, behavior)
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 975, in broadcast_and_apply
    out = apply(broadcast_pack(inputs, isscalar), 0)
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 745, in apply
    outcontent = apply(nextinputs, depth + 1)
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/_util.py", line 786, in apply
    nextinputs.append(x.broadcast_tooffsets64(offsets).content)
ValueError: in ListOffsetArray64, cannot broadcast nested list

(https://github.com/scikit-hep/awkward-1.0/blob/0.3.2/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)

You can see that with ak.num, which tells you the number of elements in each list. If they differ, like the different numbers of electrons, muons, and jets in each event, then fields from one can't be put into fields of another because there are too many of some values and not enough of others.

>>> ak.num(deep1), ak.num(deep2)
(<Array [3, 0, 2, 3, 1] type='5 * int64'>,
 <Array [2, 1, 3, 0, 1] type='5 * int64'>)
>>> ak.num(x), ak.num(y)
(<Array [3, 0, 2, 3, 1] type='5 * int64'>,
 <Array [2, 1, 3, 0, 1] type='5 * int64'>)

But maybe you never meant to try to put them all in common lists. Maybe you only wanted to shallowly zip them:

>>> maybe2 = ak.zip({"x": x, "y": y}, depth_limit=1)
>>> maybe2.type
5 * {"x": var * {"x1": float64, "x2": int64}, "y": var * {"y1": float64, "y2": string}}
>>> maybe2.tolist()
[{'x': [{'x1': 1.1, 'x2': 1}, {'x1': 1.1, 'x2': 2}, {'x1': 1.1, 'x2': 3}],
  'y': [{'y1': 1.1, 'y2': 'a'}, {'y1': 1.1, 'y2': 'b'}]},
 {'x': [],
  'y': [{'y1': 2.2, 'y2': 'c'}]},
 {'x': [{'x1': 3.3, 'x2': 4}, {'x1': 3.3, 'x2': 5}],
  'y': [{'y1': 3.3, 'y2': 'd'}, {'y1': 3.3, 'y2': 'e'}, {'y1': 3.3, 'y2': 'f'}]},
 {'x': [{'x1': 4.4, 'x2': 6}, {'x1': 4.4, 'x2': 7}, {'x1': 4.4, 'x2': 8}],
  'y': []},
 {'x': [{'x1': 5.5, 'x2': 9}],
  'y': [{'y1': 5.5, 'y2': 'g'}]}]

Or maybe you never meant to broadcast the event-level variables ("shallow" things like "nMuLoose") in the first zip, anyway.

Basically, you need to be aware of the data structures you're making, the difference between zipping "all the way down" to the particle attributes, duplicating any event-level variables to match, and zipping down to some depth with depth_limit. You might also not want to make records of records, since that will determine how you extract them later—or maybe you do—that's personal choice.

It's much easier to get this awareness if you develop your analysis interactively on a Python prompt, in IPython, or in Jupyter. As you see above, it was instructive to print out the types of each object and the first few samples with tolist (slice your data with events[:2] so that you don't print too much!). Awkward, like NumPy, was made for interactive development, and as soon as it gets moderately complex, you have to develop interactively (like NumPy: indexing can be complicated). From your test.py, I'm guessing that you are developing by adding some code to the script and re-running the script, like an edit-compile-run cycle in C++. That's an uphill battle: as your script gets longer, debugging will get slower and slower—it's unproductive in the long run.

By the end, you may want everything wrapped up in a script that you can run in a push-button way, not a notebook that you have to engage interactively. For that, I usually experiment in an interactive terminal/IPython/Jupyter and copy each bit into a script as soon as it works. The notebooks are not final products, but an essential step.

0 replies

ramankhurana · 2020-10-08T13:55:05Z

ramankhurana
Oct 8, 2020
Author

Thanks a lot Jim for quick and detailed reply. I was in the mean time following this documentation [1] again and seems I did not understood the zipping properly. I am reading your reply and following the tutorial and check what suits in my case.

Thanks again for advise to use the jupyter, I will try that for quick debugging, indeed I was so far using [:10] to see the print of a few events. Ofcourse jupyter will will help in fasten the debugging,.

Regards.

[1]
https://mybinder.org/v2/gh/jpivarski/2020-07-13-pyhep2020-tutorial.git/1.1?urlpath=lab/tree/tutorial.ipynb

0 replies

jpivarski · 2020-10-08T14:17:39Z

jpivarski
Oct 8, 2020
Maintainer

For my part, I need to translate explanations like the above into formal documentation on https://awkward-array.org. The trouble is that everybody needs to know something different: in your case it was the depth_limit parameter. If this has been helpful, go ahead and close the issue now. (Though that means it will be harder for others to find! That's the trouble with GitHub Issues for "how do I?" type questions.)

0 replies

ramankhurana · 2020-10-08T14:42:26Z

ramankhurana
Oct 8, 2020
Author

Indeed, It was issue of depth_limit. I was aware of depth_limit and did it right at first step and then forgot it because I was using awkward1 for the first time. It was so silly to spend to much time debugging myself and then yours.

What should i use next time to ask some question if I don't find it in the documentation?

0 replies

jpivarski · 2020-10-08T14:57:59Z

jpivarski
Oct 8, 2020
Maintainer

What should i use next time to ask some question if I don't find it in the documentation?

For a "how do I?" question, the best place is StackOverflow with the [awkward-array] tag. Then I can provide an answer like the above and when it's done, it will continue to be visible. In GitHub Issues, when something's done, it has to be closed so that I can tell what still needs to be fixed. (If everything was left open, I'd have to keep track of issues elsewhere.)

0 replies

ramankhurana · 2020-10-08T15:02:51Z

ramankhurana
Oct 8, 2020
Author

I understand, I will use stackoverflow in future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcasting the results of two ak.zips #486

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Broadcasting the results of two ak.zips #486

ramankhurana Oct 8, 2020

Replies: 6 comments

jpivarski Oct 8, 2020 Maintainer

ramankhurana Oct 8, 2020 Author

jpivarski Oct 8, 2020 Maintainer

ramankhurana Oct 8, 2020 Author

jpivarski Oct 8, 2020 Maintainer

ramankhurana Oct 8, 2020 Author

ramankhurana
Oct 8, 2020

jpivarski
Oct 8, 2020
Maintainer

ramankhurana
Oct 8, 2020
Author

jpivarski
Oct 8, 2020
Maintainer

ramankhurana
Oct 8, 2020
Author

jpivarski
Oct 8, 2020
Maintainer

ramankhurana
Oct 8, 2020
Author