docs: add better code-docs on buffer optimisation #564

agoose77 · 2024-12-18T14:59:12Z

Although we are soon to rework this, I'm updating the docs in case I run out of time in my other PR reviews.

This is a good opportunity to explain how buffer optimisation currently works.

In dask-awkward, there are two separate concepts:

buffer -- a 1D array of data that Awkward Array consumes in ak.from_buffers
column -- a possibly-structured "array-like" whose structure / type depends upon the IO source.

Each IO source has its own type of column in this language -- uproot has TTree keys, whilst Parquet has fields. The importance of the distinction is that remapped arrays may have a convoluted "column-buffer" relationship, e.g. arrays which share the offsets buffer from a singular IO source column.

Given that the details of buffer projection need to be defined by the IO sources (e.g. Parquet has to perform form unprojection, whilst uproot does not), it is conceptually trivial to think about these two things as separate internal-external concerns; users want to know which high-level columns are needed, whilst dask-awkward needs to know which buffers were needed.

As such, the optimisation is really more of the following conversation:

dask-awkward:
Hello uproot, can you "prepare" for buffer optimisation by giving me something to replace your input layer with, a special typetracing report, and any unknown state you might later need?
uproot:
Sure. Here's a new input layer that doesn't require any compute, here's the typetracer report you asked for, and can you please hold on to this state for me?
dask-awkward:
Sure! OK, I've now build a full graph by repeating (1) for each input. Now, I will compute it, and collect the reports and states.
dask-awkward:
Hello uproot, I have determined which buffers I need to you drop, can you give me a new input layer that only loads these buffers? Here's the state that you gave me earlier!

In this conversation, dask-awkward does not need to talk about columns at all. It also does not need any special buffer name convention besides that each buffer name is unique.

for more information, see https://pre-commit.ci

…n-opt' into agoose77/docs-column-opt

for more information, see https://pre-commit.ci

…n-opt' into agoose77/docs-column-opt

for more information, see https://pre-commit.ci

…n-opt' into agoose77/docs-column-opt

for more information, see https://pre-commit.ci

…n-opt' into agoose77/docs-column-opt

for more information, see https://pre-commit.ci

agoose77 and others added 15 commits December 18, 2024 14:33

docs: remove incorrect docstring

2b610ba

refactor: remove for loop

aa76539

docs: more comments

49983e4

[pre-commit.ci] auto fixes from pre-commit.com hooks

ac4e8f4

for more information, see https://pre-commit.ci

fix: restore import

114d452

Merge remote-tracking branch 'refs/remotes/origin/agoose77/docs-colum…

4c29f14

…n-opt' into agoose77/docs-column-opt

[pre-commit.ci] auto fixes from pre-commit.com hooks

cf41cae

for more information, see https://pre-commit.ci

docs: note on columns

0ccfab4

Merge remote-tracking branch 'refs/remotes/origin/agoose77/docs-colum…

af3bc74

…n-opt' into agoose77/docs-column-opt

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ed22c2

for more information, see https://pre-commit.ci

docs: more work

fd3239f

Merge remote-tracking branch 'refs/remotes/origin/agoose77/docs-colum…

cef785f

…n-opt' into agoose77/docs-column-opt

[pre-commit.ci] auto fixes from pre-commit.com hooks

3796fce

for more information, see https://pre-commit.ci

fix: remove unused import

87faf67

Merge remote-tracking branch 'refs/remotes/origin/agoose77/docs-colum…

cdabd26

…n-opt' into agoose77/docs-column-opt

agoose77 changed the title ~~docs: add better docs on column optimisation~~ docs: add better code-docs on column optimisation Dec 18, 2024

agoose77 changed the title ~~docs: add better code-docs on column optimisation~~ docs: add better code-docs on buffer optimisation Dec 18, 2024

agoose77 and others added 2 commits December 18, 2024 15:36

fix: appease mypy

be83bc9

[pre-commit.ci] auto fixes from pre-commit.com hooks

fe25be4

for more information, see https://pre-commit.ci

agoose77 merged commit 5e431bc into main Dec 18, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add better code-docs on buffer optimisation #564

docs: add better code-docs on buffer optimisation #564

agoose77 commented Dec 18, 2024 •

edited

Loading

docs: add better code-docs on buffer optimisation #564

docs: add better code-docs on buffer optimisation #564

Conversation

agoose77 commented Dec 18, 2024 • edited Loading

agoose77 commented Dec 18, 2024 •

edited

Loading