Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WiP] Bitwise versions of gist_cmp & common_ancestor functions for GiST index #40

Closed

Conversation

AbelVM
Copy link

@AbelVM AbelVM commented Aug 14, 2020

Dropping the calls to high level H3 functions to avoid redundant operations

This is the first time in like, 20 years I play with C, so please take it more as pseudocode than actual code. These functions have been migrated from my own PL/pgSQL functions, so I am sure I've lost a lot of C optimizations on the way or the syntax is as ugly as it can be. I am not ashamed of that ¯\(ツ)/¯, just word to the wise.

I just took into account the corner cases to avoid wasting CPU cycles and turn all the high-level H3 functions calls into simple bitwise operations that lower the redundant operations

I have benchmarked the PL/pgSQL versions of these functions and they take ~ 0.006 ms per call, avg

image

So, the GiST related operations should be way faster with this approach

Rebased with tag 3.6.5

No conflicts, but....

Not tested at all!!

Friday. Lunchtime. No testing environment available right now. This is just a comment to this issue that looked better as a PR 😝

@zachasme
Copy link
Owner

Thanks for working on this @AbelVM 😃

Using the bitwise versions is definitely the way to go!

I think the failing tests are mainly due to not importing h3index.h in opclass_gist.c, and also it seems PG_RETURN_H3INDEX didn't make the rebase.

@zachasme zachasme force-pushed the feature-more-operators branch from 6c9acd7 to 6db18d2 Compare August 24, 2020 09:43
@zachasme
Copy link
Owner

I've rebased the feature-more-operators branch on master, which fixes some of the errors you had.

@AbelVM
Copy link
Author

AbelVM commented Aug 24, 2020

Rebased with your rebase 😝

@zachasme zachasme mentioned this pull request Aug 24, 2020
@zachasme
Copy link
Owner

I've fixed the warnings/errors and merged your code into #42.

I've split the old branch into two, so we can get GiST working and merged before moving on to SP-GiST. We still need tests and benchmarks.

Thank you for this PR 👍

@zachasme zachasme closed this Aug 24, 2020
@AbelVM
Copy link
Author

AbelVM commented Aug 24, 2020

For reference, for testing and benchmarking my sql function (no smoke or unit tests here :) ), I created synthetic datasets with randon locations and several depths kins upwards and downwards for each location ({number_of_places} and {number_of_depths}), and then checked my custom h3_contains function

CREATE TABLE random_cells AS
WITH pre AS (
    SELECT
        t.i,
        random() * 360 - 180 AS x,
        random() * 180 - 90 AS y,
        (round(random() * (15 - 1 -{number_of_depths})) + {number_of_depths}))::int AS r
    FROM
        generate_series(1, {number_of_places}) AS t (i)
),
a AS (
    SELECT
        h3_geo_to_h3 (st_setsrid (st_point (x, y), 4326), r) AS h3_id,
        i::text AS place,
        r
    FROM
        pre
)
SELECT
    *
FROM
    a
UNION
SELECT
    h3_to_parent (h3_id, r - {number_of_depths}) AS h3_id,
    place,
    r - {number_of_depths} AS r
FROM
    a
UNION
SELECT
    h3_to_children (h3_id, r + {number_of_depths}) AS h3_id,
    place,
    r + {number_of_depths} AS r
FROM
    a;

-- GiST-indexing stuff  should be here:)

-- Validity check and benchmark
SELECT
    c1.place || '-' || c2.place AS a_b,
    c1.r || '-' || c2.r AS resa_resb,
    h3_contains (c1.h3_id, c2.h3_id) AS a_contains_b
FROM
    b AS c1
CROSS JOIN 
    b AS c2
ORDER BY
    1,2;

My results, time in ms
image

But to test that the index actually kicks in, I'd go with a big dataset. Something like, using HD geometries

  • [dataset A] polyfilling a country at high resolution
  • indexing A and so :)
  • [dataset B] polyfilling a province of the same country at same resolution and then h3_compact the results
  • find the cells in A contained in the cells of B

Results validity can be easily verified with QGIS and the query time should be, imho, in the same or lower order than ST_Intersection(country, province)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants