Reduce initial costs #2157

Durman · 2023-01-25T05:02:05Z

Durman
Jan 25, 2023

Hello, I'm creating something like visual programming for procedural modeling with help of the library. I've implemented several functions and faced with next problem. No matter how much data income into a function or how much data it produces the execution time is nearly the same. On the video performance of Subdivide Polyline node is always about 7 milliseconds.

AwkwardTest.mp4

At first I thought that this is a bug but after reading some answers here I came to conclusion that this is so called initial cost. So the question is there a way to reduce it? In my case it is quite significant because users can create trees with thousands of nodes and if all of them have some initial costs the tree will soon loos its ability to calculate in real time.

Here is the function in case I'm doing something wrong.

import numpy as np
import awkward as ak


def subdivide_polyline(verts, cuts):
    """Only works with list of polylines"""
    lines_mum = len(verts)
    segment_num = ak.num(verts, axis=-2) - 1
    segment_shape = ak.num(verts, axis=-1)[..., :-1]
    _, count_per_seg = ak.broadcast_arrays(segment_shape, cuts)
    new_verts_num = segment_num * cuts + ak.num(verts)
    total_num = ak.sum(new_verts_num)
    seg_weight0 = np.zeros(total_num - lines_mum)
    seg_weight1 = ak.unflatten(seg_weight0, ak.flatten(count_per_seg)+1)
    seg_weight2 = ak.unflatten(seg_weight1, segment_num)
    seg_weight3 = ak.local_index(seg_weight2)
    seg_weight = seg_weight3 / (count_per_seg+1)
    vert1 = verts[..., :-1, :]
    vert1 = vert1[..., np.newaxis, :]
    vert2 = verts[..., 1:, :]
    vert2 = vert2[..., np.newaxis, :]
    new_verts = linear_interpolation(vert1, vert2, seg_weight)
    new_verts = ak.flatten(new_verts, axis=-2)
    return new_verts


def linear_interpolation(v1, v2, factor):
    return v1 * (1-factor) + v2 * factor


if __name__ == '__main__':
    p_line = ak.Array([[[0, 0, 0], [1.5, 0, 0], [2, 0.5, 0]],
                       [[0, 0, 0], [0, 1.5, 0], [0, 2, 0.5], [0, 3, 0]]])
    from timeit import timeit
    t = timeit('subdivide_polyline(p_line, ak.Array([1]))', 'from __main__ import ak, subdivide_polyline, p_line', number=100)
    print(f"{t*10:.1f} ms")

Answered by agoose77

Jan 26, 2023

We don't yet have a section on this in our docs, but I'll get round to that at some point!

The best solution for your needs (low overhead) is probably Numba. The trade-off of using Numba is primarily expressiveness; some things are more cumbersome or sometimes impossible to write. However, in this case, it should be fairly straightforward. Here's a function doing what I think you're hoping to achieve; to subdivide a set of poly-lines N times:

This function evaluates much, much faster than the pure Awkward variant (because the data are so small), which is beneficial for your use case. I get 0.02 seconds per thousand iterations, which compares with 8 seconds per thousand for the pure-awkwar…

View full answer

agoose77 · 2023-01-25T17:16:03Z

agoose77
Jan 25, 2023
Collaborator

As you observe, we anticipate a certain "constant overhead" that is independent of the size of the data being operated upon (and scales linearly with the number of steps that you take. Whilst there is room for some performance optimisation here, realtime performance needs (i.e. <16ms) are somewhat out of scope. You might find for your application that writing a Numba routine to perform this logic (without awkward) performs better; Numba jitted functions don't have a significant performance overhead IIRC.

I will update this conversation with some pointers.

0 replies

agoose77 · 2023-01-26T13:56:35Z

agoose77
Jan 26, 2023
Collaborator

We don't yet have a section on this in our docs, but I'll get round to that at some point!

The best solution for your needs (low overhead) is probably Numba. The trade-off of using Numba is primarily expressiveness; some things are more cumbersome or sometimes impossible to write. However, in this case, it should be fairly straightforward. Here's a function doing what I think you're hoping to achieve; to subdivide a set of poly-lines N times:

This function evaluates much, much faster than the pure Awkward variant (because the data are so small), which is beneficial for your use case. I get 0.02 seconds per thousand iterations, which compares with 8 seconds per thousand for the pure-awkward case.

I'm assuming that you want to take a ragged array of polylines, i.e of type: N * var * 3, where N is the number of lines, var the number of vertices per line, and 3 the number of coordinates.

import numpy as np
import awkward as ak
import numba as nb
from timeit import timeit


@nb.njit
def subdivide_polylines(poly_lines, n_line_subdivisions):
    # Pre-pass over the data to determine how large our arrays need to be
    final_vertex_count = 0
    for line in poly_lines:
        n_segments = len(line) - 1
        final_vertex_count += len(line) + n_line_subdivisions * n_segments

    flat_result = np.empty((final_vertex_count, 3), dtype=np.float64)
    flat_count = np.zeros(len(poly_lines), dtype=np.int64)

    # New number of vertices per segment
    n_final_segment_vertices = 2 + n_line_subdivisions

    # Keep track of the vertex index in the flat result
    l_vertex_index = 0

    # For each polyline
    for i_line, line in enumerate(poly_lines):
        assert len(line) >= 2

        # Given that stop(line i) == start(line i+1), we skip the first
        # vertex of the line to avoid double counting
        # Hence, let's write that here
        start = np.asarray(line[0])
        flat_result[l_vertex_index] = start
        flat_count[i_line] += 1
        l_vertex_index += 1

        # For each segment (three vertices = two segments)
        n_segments = len(line) - 1
        for j_segment in range(n_segments):
            # Get these as NumPy arrays
            start = np.asarray(line[j_segment])
            stop = np.asarray(line[j_segment + 1])

            # For each vertex in the result, skiping the first (as explained above)
            for k_segment_vertex in range(1, n_final_segment_vertices):
                # Interpolate between start and stop
                t = k_segment_vertex / (n_final_segment_vertices - 1)
                flat_result[l_vertex_index] = start * (1 - t) + stop * t
                l_vertex_index += 1
        flat_count[i_line] = len(line) + n_line_subdivisions * n_segments

    return flat_result, flat_count


if __name__ == "__main__":
    p_line = ak.Array(
        [
            [[0, 0, 0], [1.5, 0, 0], [2, 0.5, 0]],
            [[0, 0, 0], [0, 1.5, 0], [0, 2, 0.5], [0, 3, 0]],
        ]
    )
    # Trigger the jit to compile (ensure this function is only defined once, or you'll pay the JIT cost each time)
    subdivide_polylines(p_line, 1)

    t = timeit("subdivide_polylines(p_line, 1)", number=1000, globals=globals())
    print(f"{t:.3f} s per 1000")

    flat_result, flat_count = subdivide_polylines(p_line, 1)
    result = ak.unflatten(flat_result, flat_count, axis=0)
    print(result.tolist())

6 replies

agoose77 Jan 27, 2023
Collaborator

In this context, I've used Awkward Arrays to represent the ragged structure of the lists. If you use NumPy to do this by itself, you would need to keep around an extra array to track the list lengths. So, awkward could be a good fit for concisely storing this metadata. Also, depending upon how much you expose to third-party scripts, it might be convenient to expose things in terms of Awkward Arrays.

Numba is integrated with Awkward at the loop level, i.e. you can loop over the ragged arrays performantly, which is also the fastest way to use Numba. In short, Numba can usually get more speed out of a loop than using e.g np.sum.

We haven't added any support for using the ak.xxx operations in Numba, because they're already mostly running in C++, so there would not be much benefit.

Durman Jan 27, 2023
Author

Thanks a lot. I will try different approaches with numpy and numba to find what is better solving my problem.)

agoose77 Jan 27, 2023
Collaborator

Basing on your last answers I come to a conclusion that some set of problems (like missing feature, or something does not work as somebody would expect) can be solved by switching from awkward arrays to something else (like numpy, numba or whatever). It seems like not quite valid solution because in this case the library is not used.

To clarify, yes, we don't target small arrays as a performance goal in the ak.XXX layer. But, Awkward Array is designed to be easy to use with Numba; you can still benefit from expressing your data in Awkward Arrays, whilst operating upon them in Numba. So, if you're not upset about including an additional dependency, I'd suggest using Awkward in this way will make your life easier.

For posterity, this is the simplest Python-only (non-Numba) Awkward solution, which runs no faster than ~5ms, but scales fairly nicely.

def subdivide_polylines(p_line, n_subdivisions):
    p_start = p_line[:, :-1, np.newaxis]
    p_stop = p_line[:, 1:, np.newaxis]
    t = np.arange(n_subdivisions + 1).reshape((1, 1, -1, 1)) / (n_subdivisions + 1)

    p = p_start * (1 - t) + p_stop * t
    return ak.concatenate((ak.flatten(p, axis=2), p_line[:, [-1]]), axis=1)

There's probably a nicer way of avoiding the concatenate, but I can't think of it off the top of my head.

jpivarski Jan 27, 2023
Maintainer

Just to be clear, by "Numba," we're referring to using Numba with Awkward Array, not either-or. They pair nicely: Awkward can load a large, structured dataset, and iterating over an Awkward Array in Numba is as fast as iterating over NumPy arrays in Numba, with the added advantage that the Awkward Array is structured.

In writing a function for Numba, you get the best performance by writing for loops, as though it were C code, but with Python syntax. The Numba developers have reimplemented many NumPy functions, such as np.whatever, so that they work-alike inside a Numba function, but creating new arrays in which to put the output and having multiple passes over an input dataset are performance bottlenecks at this scale. At the scale of compiled code, you want to avoid unnecessary allocations and loops. The Awkward Array functions, ak.whatever, haven't even been reimplemented for Numba, so trying to use ak.whatever inside of a function compiled by @nb.njit would raise an error.

As a dependency, Numba + llvmlite is no more than 40 MB of binary wheels (depends on platform). Depending on your requirements, it can be relatively large, but it's well packaged and installs easily on all platforms that I've tested.

To use Awkward + Numba effectively, you'd want to get all of your data into one big Awkward Array and use it repeatedly. All of this has been designed for large-scale batch processing (i.e. data analysis); we haven't thought a lot about real-time systems. Would it be possible to do that in your case? I think @agoose77 has solved the algorithm, but what about data-loading—how this integrates with your larger system?

Durman Jan 30, 2023
Author

@jpivarski Thanks for the clarification.

Unlike data analysis in procedural modeling a lot of data can be generated. For example here is a cube which is subdivided iteratively 6 times and in the end we have thousands of points. There is even no any input data here.

Each node should be able to handle one point or thousands or even millions of points. A node tree can have hundreds or even thousands of nodes. So in order it being real-time each node should be able to work pretty fast. It's not necessary to have 30 fps but the more it's close to it the more convenient work.

So since awkward arrays give performance of numpy I think I have to use numba any way to gain maximum performance. But now I'm on stage of proving concept and I'm going to use awkward first and then try what can be done with numba.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce initial costs #2157

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Reduce initial costs #2157

Durman Jan 25, 2023

Replies: 2 comments · 6 replies

agoose77 Jan 25, 2023 Collaborator

agoose77 Jan 26, 2023 Collaborator

agoose77 Jan 27, 2023 Collaborator

Durman Jan 27, 2023 Author

agoose77 Jan 27, 2023 Collaborator

jpivarski Jan 27, 2023 Maintainer

Durman Jan 30, 2023 Author

Durman
Jan 25, 2023

Replies: 2 comments 6 replies

agoose77
Jan 25, 2023
Collaborator

agoose77
Jan 26, 2023
Collaborator

agoose77 Jan 27, 2023
Collaborator

Durman Jan 27, 2023
Author

agoose77 Jan 27, 2023
Collaborator

jpivarski Jan 27, 2023
Maintainer

Durman Jan 30, 2023
Author