-
In short I'm looking for something like serchsorted in numpy. The problem. They are represented similar to such array - And have serious of numbers which represent position of some points on the lines, like this What comes to my mind is using binary search with Python loops, but they might be too slow. Probably there is another solution? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Hi @Durman! Awkward Array does not yet have a direct function to compute positions within ragged arrays, which is what you appear to be looking for. However, we do have tools that let you build this yourself. There are two ways of achieving this that I can think of. One is to use Numba, and the other is to flatten, search over the flattened list, and rebuild. I'll show both, but I would recommend the use of Numba for readability 1. NumbaThe simplest approach that I can think of is to use import numba as nb
import awkward as ak
import numpy as np
haystack = ak.Array([[0.0, 1.86, 2.93, 3.62, 4.9], [0.0, 0.83, 1.52, 3.28, 4.08]])
needles = ak.Array([[1.5, 4.0], [2.0]])
@nb.njit
def searchsorted_2d(builder, array, needle):
for array_i, needle_i in zip(array, needle):
result_i = np.searchsorted(np.asarray(array_i), np.asarray(needle_i))
builder.begin_list()
for x in result_i:
builder.integer(x)
builder.end_list()
return builder
index = searchsorted_2d(ak.ArrayBuilder(), haystack, needles).snapshot() However, we know exactly how big our result is, so it is wasteful to use import numba as nb
import awkward as ak
import numpy as np
haystack = ak.Array([[0.0, 1.86, 2.93, 3.62, 4.9], [0.0, 0.83, 1.52, 3.28, 4.08]])
needles = ak.Array([[1.5, 4.0], [2.0]])
@nb.njit
def searchsorted_2d(flat_result, array, needle):
j = 0
for array_i, needle_i in zip(array, needle):
result_i = np.searchsorted(np.asarray(array_i), np.asarray(needle_i))
k = j + len(result_i)
flat_result[j:k] = result_i
j = k
num = ak.num(needles)
result_length = ak.sum(num)
flat_result = np.zeros(result_length, dtype=np.int64)
# Fill the result
searchsorted_2d(flat_result, haystack, needles)
index = ak.unflatten(flat_result, num) 2. Flattened searchIn this example, I'm assuming that you have a two dimensional array, and that none of the sublists are empty. If those assumptions aren't true, then we would need to update this example. The manual way of doing this is to flatten both the "needle" and the "haystack" into one-dimensional arrays, and keep track of the "list boundaries" that separate elements from different sublists. We can then adjust these two arrays so that the haystack is monotonically increasing, i.e. for every input there is exactly one output (or, there are multiple outputs but they're all adjacent on the number line): # From
[0, 1.86, 2.93, 3.62, 4.9, [boundary], 0.0, 0.83, 1.52, 3.28, 4.08]
# To
[0, 1.86, 2.93, 3.62, 4.9, [boundary], 4.9, 5.73, 6.42, 8.18, 8.98] To adjust the haystack, we can take the end values and find their cumulative sum. By assuming that each sublist in offset = 0
for sublist in lists:
sublist += offset
offset = sublist[-1] In NumPy arrays, we can do this using import awkward as ak
import numpy as np
haystack = ak.Array([[0.0, 1.86, 2.93, 3.62, 4.9], [0.0, 0.83, 1.52, 3.28, 4.08]])
needles = ak.Array([[1.5, 4.0], [2.0]]) This would give us the # To flatten this list so that it is monotonic increasing,
# we assume each list is monotonic increasing, and cumulatively shift
# each sublist by the final value
value_starts = np.zeros(len(haystack))
value_starts[1:] = np.cumsum(haystack[1:, -1]) When we've computed the result, it is going to be an array of indices into the flat search result. We'll need to adjust these indices to account for the fact that they came from smaller sublists. We can do this by building an array of the cumulative sum of sublist lengths: # What are the start positions of each sublist in the flattened list?
starts = np.zeros(len(haystack), dtype=np.int64)
starts[1:] = np.cumsum(ak.num(haystack))[:-1] Now we actually perform the search over the adjusted, flattened array # Compute search over flattened, adjusted array
flat_index = np.searchsorted(
ak.flatten(haystack + value_starts),
ak.flatten(needles + value_starts),
) Then we unflatten this result into sublists # Adjust the indices back
unadjusted_index = ak.unflatten(flat_index, ak.num(needles)) And adjust the indices # Adjust the indices back
index = unadjusted_index - starts |
Beta Was this translation helpful? Give feedback.
Hi @Durman!
Awkward Array does not yet have a direct function to compute positions within ragged arrays, which is what you appear to be looking for. However, we do have tools that let you build this yourself.
There are two ways of achieving this that I can think of. One is to use Numba, and the other is to flatten, search over the flattened list, and rebuild. I'll show both, but I would recommend the use of Numba for readability
1. Numba
The simplest approach that I can think of is to use
np.searchsorted
in Numba, and then employArrayBuilder
to re-build the ragged result: