Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement collect_similar like collect for DiskGenerators #198

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

asinghvi17
Copy link
Member

@asinghvi17 asinghvi17 commented Oct 23, 2024

Not sure if this is the best approach, but it works for now 🤷

  • Copies the implementation of collect to an overload on Base.collect_similar
  • Adds a test to make sure that the result of map always works

I can confirm that the test fails on current master of DiskArrays and works on this PR.

@rafaqz
Copy link
Collaborator

rafaqz commented Oct 23, 2024

Iterating like that is super slow... Can we loop over eachchunk indices and work in whole blocks?

@asinghvi17
Copy link
Member Author

It's already doing that via _iterate_disk it seems. From the access logs on an AccessCountDiskArray,

julia> da.getindex_log
200-element Vector{Any}:
 (1:10, 1:10)
 (11:20, 1:10)
 (21:30, 1:10)
 (31:40, 1:10)
 (41:50, 1:10)
 (51:60, 1:10)
 (61:70, 1:10)
 (71:80, 1:10)
 (81:90, 1:10)
 (91:100, 1:10)
 (101:110, 1:10)
 (111:120, 1:10)
 ⋮
 (91:100, 91:100)
 (101:110, 91:100)
 (111:120, 91:100)
 (121:130, 91:100)
 (131:140, 91:100)
 (141:150, 91:100)
 (151:160, 91:100)
 (161:170, 91:100)
 (171:180, 91:100)
 (181:190, 91:100)
 (191:200, 91:100)

@asinghvi17
Copy link
Member Author

Should we get this in as is, as a bugfix, and then work on improving performance? It's no worse than the current state of affairs, and at least map isn't incorrect.

@meggart
Copy link
Collaborator

meggart commented Oct 23, 2024

I agree that it would be good to have a not so performant fix for now. Do you think it is worth tagging a new release or should we wait for #200 ?

@meggart meggart merged commit 1f1f075 into JuliaIO:main Oct 23, 2024
10 checks passed
@rafaqz
Copy link
Collaborator

rafaqz commented Oct 23, 2024

Yes good to at least fix it.

@asinghvi17 there is a kind of double iteration check on every call to iterate as sometimes we have to start new chunks sometimes not. It may also be type unstable sometimes because of the nesting, unless we have fixed that

@asinghvi17
Copy link
Member Author

I think a new patch release is probably worth it - not sure how tough it will be to fix the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants