Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration is broken with chunking #44

Closed
rafaqz opened this issue Aug 29, 2021 · 4 comments
Closed

Iteration is broken with chunking #44

rafaqz opened this issue Aug 29, 2021 · 4 comments

Comments

@rafaqz
Copy link
Collaborator

rafaqz commented Aug 29, 2021

Iteration seems to be broken with chunking, so when fallbacks are used by base methods you just get the wrong answer, without an error. So e.g. Array(diskarray) currently uses iteration, and can give the wrong result:

julia>   a = collect(reshape(1:90,10,9))
10×9 Matrix{Int64}:
  1  11  21  31  41  51  61  71  81
  2  12  22  32  42  52  62  72  82
  3  13  23  33  43  53  63  73  83
  4  14  24  34  44  54  64  74  84
  5  15  25  35  45  55  65  75  85
  6  16  26  36  46  56  66  76  86
  7  17  27  37  47  57  67  77  87
  8  18  28  38  48  58  68  78  88
  9  19  29  39  49  59  69  79  89
 10  20  30  40  50  60  70  80  90

julia>   a_disk = _DiskArray(a, chunksize=(5,3))
Disk Array with size 10 x 9

julia>   Array(a_disk)
10×9 Matrix{Int64}:
  1  21  16  31  51  46  61  81  76
  2  22  17  32  52  47  62  82  77
  3  23  18  33  53  48  63  83  78
  4  24  19  34  54  49  64  84  79
  5  25  20  35  55  50  65  85  80
 11   6  26  41  36  56  71  66  86
 12   7  27  42  37  57  72  67  87
 13   8  28  43  38  58  73  68  88
 14   9  29  44  39  59  74  69  89
 15  10  30  45  40  60  75  70  90

#37 may be a partial solution to this. But maybe also good to use other methods as much as possible so the iteration fallback isn't used.

@meggart
Copy link
Collaborator

meggart commented Jan 13, 2022

Ah, ok this iteration fallback is annoying. The main problem is that we should actually never have to fall back to iteration, but you are right, getting a wrong answer is bad. I will see what is possible. Maybe just throwing an error for iterate(::AbstractDiskArray) would be a solution for now? Then we would always know when a process is using some AbstractArray iteration fallback and we can fix by adding appropriate methods

@rafaqz
Copy link
Collaborator Author

rafaqz commented Jan 13, 2022

We should at least throw errors to force the use if another method. If we make a branch I can test against Rasters.jl and see if anything breaks

@meggart
Copy link
Collaborator

meggart commented Jan 20, 2022

I dived a bit into the code and first found out that we have already implemented iteration with stateful iterators (see iterators.jl). The main reason that collect fails is that eachindex and does not fit the iteration order so that values are stored into the wrong parts of the array. So all we need to do is adding a method for eachindex that fits the order in which we iterate through the array. I am currently working exactly this.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Jan 20, 2022

Ahh fixing eachindex is a very neat trick! I didn't think of that.

I also realised Rasters.jl does use DiskArrrays.jl iteration, such as in SkipMissingVal for e.g. reducing over non-nodata values. Its really fast to do reductions from disk like this, so good to keep.

@rafaqz rafaqz closed this as completed Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants