-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add better code-docs on buffer optimisation #564
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for more information, see https://pre-commit.ci
…n-opt' into agoose77/docs-column-opt
for more information, see https://pre-commit.ci
…n-opt' into agoose77/docs-column-opt
for more information, see https://pre-commit.ci
…n-opt' into agoose77/docs-column-opt
for more information, see https://pre-commit.ci
…n-opt' into agoose77/docs-column-opt
agoose77
changed the title
docs: add better docs on column optimisation
docs: add better code-docs on column optimisation
Dec 18, 2024
agoose77
changed the title
docs: add better code-docs on column optimisation
docs: add better code-docs on buffer optimisation
Dec 18, 2024
for more information, see https://pre-commit.ci
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Although we are soon to rework this, I'm updating the docs in case I run out of time in my other PR reviews.
This is a good opportunity to explain how buffer optimisation currently works.
In dask-awkward, there are two separate concepts:
buffer
-- a 1D array of data that Awkward Array consumes inak.from_buffers
column
-- a possibly-structured "array-like" whose structure / type depends upon the IO source.Each IO source has its own type of
column
in this language -- uproot hasTTree
keys, whilst Parquet hasfields
. The importance of the distinction is that remapped arrays may have a convoluted "column-buffer" relationship, e.g. arrays which share the offsets buffer from a singular IO sourcecolumn
.Given that the details of buffer projection need to be defined by the IO sources (e.g. Parquet has to perform form unprojection, whilst uproot does not), it is conceptually trivial to think about these two things as separate internal-external concerns; users want to know which high-level columns are needed, whilst dask-awkward needs to know which buffers were needed.
As such, the optimisation is really more of the following conversation:
dask-awkward
:Hello
uproot
, can you "prepare" for buffer optimisation by giving me something to replace your input layer with, a special typetracing report, and any unknown state you might later need?uproot
:Sure. Here's a new input layer that doesn't require any compute, here's the typetracer report you asked for, and can you please hold on to this state for me?
dask-awkward
:Sure! OK, I've now build a full graph by repeating (1) for each input. Now, I will compute it, and collect the reports and states.
dask-awkward
:Hello
uproot
, I have determined which buffers I need to you drop, can you give me a new input layer that only loads these buffers? Here's the state that you gave me earlier!In this conversation,
dask-awkward
does not need to talk about columns at all. It also does not need any special buffer name convention besides that each buffer name is unique.