Replies: 5 comments 9 replies
-
Setting the top-level
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick answer! I should have mentioned that i already tried the "non-simplifying" variant with only the first call of lazy = ak.from_parquet("array.parquet", lazy=True)
named_lazy_A = with_name_lazy(lazy.A, "AClass") # using the function that does not have the simplify call
lazy._caches
|
Beta Was this translation helpful? Give feedback.
-
I also attempted to rebuild the import awkward as ak
import json
lazy = ak.from_parquet("array.parquet", lazy=True)
def rebuild_virtual_with_name(layout, name):
generator = layout.generator
args_list = list(generator.args)
form = args_list[3]
form_dict = json.loads(form.tojson())
form_dict["content"]["parameters"]["__record__"] = "AClass"
new_form = ak.forms.Form.fromjson(json.dumps(form_dict))
args_list[3] = new_form
return ak.Array(
ak.layout.VirtualArray(
ak.layout.ArrayGenerator(
generator.callable,
args=tuple(args_list),
form=new_form,
)
)
)
lazy_A = rebuild_virtual_with_name(lazy.A.layout, "AClass") if i now try to access fields of |
Beta Was this translation helpful? Give feedback.
-
@nikoladze, maybe you can avoid https://awkward-array.org/how-to-create-constructors.html#content-recordarray Since your array contains VirtualArrays, you need to keep the original ak.Array in scope until you've made a new ak.Array, because the VirtualArrays only keto weak reference to any attached caches: https://awkward-array.org/how-to-create-constructors.html#content-virtualarray If your use of |
Beta Was this translation helpful? Give feedback.
-
@jpivarski, @agoose77, thanks a lot for your detailed answers! I did not really think about using the existing virtual array to generate a new virtual array, but maybe that is the way to go here. But i think i haven't quite understood how the nesting of virtual arrays works. I tried the following: lazy = ak.from_parquet("array.parquet", lazy=True)
leaf_form = {
"class": "NumpyArray",
"itemsize": 8,
"format": "l",
"primitive": "int64",
}
def get_virtual_content(virtual_col):
def generate():
return ak.materialized(virtual_col).layout.content
return ak.layout.VirtualArray(
ak.layout.ArrayGenerator(
generate,
form=ak.forms.Form.fromjson(json.dumps(leaf_form))
)
)
def get_virtual_record(lazy_col, with_name):
form = ak.forms.Form.fromjson(json.dumps({
"class": "ListOffsetArray64",
"offsets": "i64",
"content": {
"class": "RecordArray",
"contents": {
"x": leaf_form,
"y": leaf_form,
}
},
"parameters": {
"__record__": with_name
}
}))
def generate():
return ak.layout.ListOffsetArray64(
ak.materialized(lazy_col.x).layout.offsets,
ak.layout.RecordArray(
[
get_virtual_content(lazy_col.x.layout),
get_virtual_content(lazy_col.y.layout)
],
["x", "y"]
),
parameters={"__record__": with_name}
)
return ak.Array(
ak.layout.VirtualArray(
ak.layout.ArrayGenerator(generate, length=len(lazy_col), form=form)
)
)
lazy_A = get_virtual_record(lazy.A, "AClass") However if i now access e.g In general i was hoping to get a more straightforward solution to read a Tagging @nsmith- maybe he has some ideas as well. |
Beta Was this translation helpful? Give feedback.
-
I'd like to read a nested array lazily from parquet files and attach behavior to some of the record fields. For example if i would have
i want to load this with
Now, for example if i want to set the
__record__
parameter of the "A" record i could do something likebut that will materialize the virtual array under "A". Skimming over the code for
ak.from_parquet
it seems manually reconstructing that VirtualArray will be quite involved.Is there a straightforward solution for this?
Beta Was this translation helpful? Give feedback.
All reactions