Replies: 2 comments 2 replies
-
There's also a Since it came later, Awkward Array's lists use 64-bit signed indexes by default, but can also use 32-bit signed or unsigned. Since 64-bit is the default, many operations will end up giving you 64-bit indexes (for simplicity in implementation, actually). When converting an Awkward Array to Arrow with ak.to_arrow or Parquet with ak.to_parquet, the 64-bit Awkward lists are converted into Arrow However, both of these functions have a If PyTorch is not actually objecting to the
instead of
where the I don't know if we have a nice function for promoting non-nullable data into nullable data. A hacky way to do it would be to concatenate a missing value at the right depth and then slice it off, like this: >>> array = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> array
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>
>>> ak.concatenate((array, [[None]]))
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5], [None]] type='4 * var * ?float64'>
>>> ak.concatenate((array, [[None]]))[:-1]
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * ?float64'> There would be faster-for-the-computer ways of doing this by inserting an Well, maybe this: >>> empty_missingness = ak.Array(ak.contents.UnmaskedArray(ak.contents.EmptyArray()))[np.newaxis][:0]
>>> empty_missingness
<Array [] type='0 * 0 * ?unknown'>
>>> ak.concatenate((array, empty_missingness))
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * ?float64'> The use of |
Beta Was this translation helpful? Give feedback.
-
After some time I was able to load the data in two ways using the torchdata API by creating two custom Here is my code for implementing the two datapipes - one for torch_geometric and one for vanilla pytorch.
And this is the output on my machine:
|
Beta Was this translation helpful? Give feedback.
-
I am looking for the most natural of creating a pytorch dataloader from awkward arrays, when the collection of those arrays does not fit into memory. The arrays I'm working with contain events with a variable-number of 4 vectors.
I tried to save the files to parquet and then using the torchdata API to load them (as shown here, however I get an error:
NotImplementedError: Unsupported Arrow type: large_list<item: float not null> This exception is thrown by __iter__ of ParquetDFLoaderIterDataPipe(columns=None, device='', dtype=None, source_dp=FileListerIterDataPipe, use_threads=False)
The nvidia-merlin library looks like it's made for this exact purpose but there isn't a lot of documentation.
Beta Was this translation helpful? Give feedback.
All reactions