From ace232de68074a88c96a6d60c0f6ca2e101ce0ed Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tilmann=20Z=C3=A4schke?= Date: Wed, 26 Jan 2022 12:08:05 +0000 Subject: [PATCH] Release 1.1.0 (#25) --- CHANGELOG.md | 20 ++- CMakeLists.txt | 2 +- README.md | 356 +++++++++++++++++++++++++------------------------ 3 files changed, 196 insertions(+), 182 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 499c5fcd..cf93b08b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,17 +5,23 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## [Unreleased] +Nothing yet. + +## [1.1.0] - 2022-01-25 ### Added -- FilterSphere for filtering by sphere constraint (by ctbur) +- FilterSphere for filtering by sphere constraint (by ctbur) [#16](https://github.com/improbable-eng/phtree-cpp/pull/16) +- IEEE converter for 32bit float, see `distance.h` (by ctbur) [#18](https://github.com/improbable-eng/phtree-cpp/pull/18) + ### Changed -- Fixed imports `` -> `` (by ctbur) -- Cleaned up build scripts -- Fixed warnings: +- Performance improvement for updates and queries: removed use of `std::variant`. [#23](https://github.com/improbable-eng/phtree-cpp/pull/23) +- Fixed imports `` -> `` (by ctbur) [#15](https://github.com/improbable-eng/phtree-cpp/pull/15) +- Cleaned up build scripts [#21](https://github.com/improbable-eng/phtree-cpp/pull/21) +- Fixed warnings: [#20](https://github.com/improbable-eng/phtree-cpp/pull/20) - "unused function argument" warnings - gcc/clang warnings - MSVC warnings - reserved identifier warnings (identifiers starting with `_`) -- typos in README.md +- typos in README.md [#22](https://github.com/improbable-eng/phtree-cpp/pull/22) ## [1.0.1] - 2021-05-06 ### Changed @@ -60,7 +66,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. - Nothing. -[Unreleased]: https://github.com/improbable-eng/phtree-cpp/compare/v1.0.1...HEAD +[Unreleased]: https://github.com/improbable-eng/phtree-cpp/compare/v1.1.0...HEAD +[1.1.0]: https://github.com/improbable-eng/phtree-cpp/compare/v1.0.0...v1.1.0 [1.0.1]: https://github.com/improbable-eng/phtree-cpp/compare/v1.0.0...v1.0.1 [1.0.0]: https://github.com/improbable-eng/phtree-cpp/compare/v0.1.0...v1.0.0 -[0.2.0]: https://github.com/improbable-eng/phtree-cpp/compare/v0.1.0...v0.2.0 diff --git a/CMakeLists.txt b/CMakeLists.txt index 78192148..699dacd4 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -1,7 +1,7 @@ cmake_minimum_required(VERSION 3.14) # set the project name -project(PH_Tree_Main VERSION 1.0.1 +project(PH_Tree_Main VERSION 1.1.0 DESCRIPTION "PH-Tree C++" LANGUAGES CXX) diff --git a/README.md b/README.md index 6fe05605..fad24140 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,21 @@ +**Note: for updates please also check the [fork](https://github.com/tzaeschke/phtree-cpp) by the original PH-Tree developer.** + # PH-Tree C++ -The PH-Tree is an ordered index on an n-dimensional space (quad-/oct-/2^n-tree) where each -dimension is (by default) indexed by a 64bit integer. The index order follows z-order / Morton -order. The default implementation is effectively a 'map', i.e. each key is associated with at most one value. +The PH-Tree is an ordered index on an n-dimensional space (quad-/oct-/2^n-tree) where each dimension is (by default) +indexed by a 64bit integer. The index order follows z-order / Morton order. The default implementation is effectively +a 'map', i.e. *each key is associated with at most one value.* Keys are points or boxes in n-dimensional space. - - -Two strengths of PH-Trees are fast insert/removal operations and scalability with large datasets. -It also provides fast window queries and _k_-nearest neighbor queries, and it scales well with higher dimensions. -The default implementation is limited to 63 dimensions. +Two strengths of PH-Trees are fast insert/removal operations and scalability with large datasets. It also provides fast +window queries and _k_-nearest neighbor queries, and it scales well with higher dimensions. The default implementation +is limited to 63 dimensions. The API ist mostly analogous to STL's `std::map`, see function descriptions for details. -Theoretical background is listed [here](#research). +Theoretical background is listed [here](#theory). -More information about PH-Trees (including a Java implementation) is available [here](http://www.phtree.org). +More information about PH-Trees (including a Java implementation) is available [here](http://www.phtree.org). ---------------------------------- @@ -29,13 +29,13 @@ More information about PH-Trees (including a Java implementation) is available [ [Queries](#queries) - * [for_each](#for-each-example) +* [for_each](#for-each-example) - * [Iterators](#iterator-examples) +* [Iterators](#iterator-examples) - * [Filters](#filters) +* [Filters](#filters) - * [Distance Functions](#distance-functions) +* [Distance Functions](#distance-functions) [Converters](#converters) @@ -59,42 +59,43 @@ More information about PH-Trees (including a Java implementation) is available [ [cmake](#cmake) - ## Further Resources -[Theory](#research) +[Theory](#theory) ---------------------------------- -## API Usage +## API Usage - + #### Key Types The **PH-Tree Map** supports out of the box five types: -- `PhTreeD` uses `PhPointD` keys, which are vectors/points of 64 bit `double`. -- `PhTreeF` uses `PhPointF` keys, which are vectors/points of 32 bit `float`. + +- `PhTreeD` uses `PhPointD` keys, which are vectors/points of 64 bit `double`. +- `PhTreeF` uses `PhPointF` keys, which are vectors/points of 32 bit `float`. - `PhTreeBoxD` uses `PhBoxD` keys, which consist of two `PhPointD` that define an axis-aligned rectangle/box. - `PhTreeBoxF` uses `PhBoxF` keys, which consist of two `PhPointF` that define an axis-aligned rectangle/box. - `PhTree` uses `PhPoint` keys, which are vectors/points of `std::int64` The **PH-Tree MultiMap** supports out of the box three types: -- `PhTreeMultiMapD` uses `PhPointD` keys, which are vectors/points of 64 bit `double`. + +- `PhTreeMultiMapD` uses `PhPointD` keys, which are vectors/points of 64 bit `double`. - `PhTreeMultiMapBoxD` uses `PhBoxD` keys, which consist of two `PhPointD` that define an axis-aligned rectangle/box. - `PhTreeMultiMap` uses `PhPoint` keys, which are vectors/points of `std::int64` -Additional tree types can be defined easily analogous to the types above, please refer to the declaration of the tree types -for an example. -Support for custom key classes (points and boxes) as well as custom coordinate mappings can be implemented using custom `Converter` classes, see below. -The `PhTreeMultiMap` is by default backed by `std::unordered_set` but this can be changed via a template parameter. +Additional tree types can be defined easily analogous to the types above, please refer to the declaration of the tree +types for an example. Support for custom key classes (points and boxes) as well as custom coordinate mappings can be +implemented using custom `Converter` classes, see below. The `PhTreeMultiMap` is by default backed +by `std::unordered_set` but this can be changed via a template parameter. -The `PhTree` and `PhTreeMultiMap` types are available from `phtree.h` and `phtree_multimap.h`. +The `PhTree` and `PhTreeMultiMap` types are available from `phtree.h` and `phtree_multimap.h`. - - + #### Basic Operations + ```C++ class MyData { ... }; MyData my_data; @@ -123,7 +124,7 @@ tree.relocate(p_old, p_new, value); tree.estimate_count(query); ``` - + #### Queries @@ -132,11 +133,12 @@ tree.estimate_count(query); * For-each with box shaped window queries: `tree.fore_each(PhBoxD(min, max), callback);` * Iterator for box shaped window queries: `auto q = tree.begin_query(PhBoxD(min, max));` * Iterator for _k_ nearest neighbor queries: `auto q = tree.begin_knn_query(k, center_point, distance_function);` -* Custom query shapes, such as spheres: `tree.for_each(callback, FilterSphere(center, radius, tree.converter()));` +* Custom query shapes, such as spheres: `tree.for_each(callback, FilterSphere(center, radius, tree.converter()));` - + ##### For-each example + ```C++ // Callback for counting entries struct Counter { @@ -152,9 +154,10 @@ tree.for_each({{1, 1, 1}, {3, 3, 3}}, callback); // callback.n_ is now the number of entries in the box. ``` - + ##### Iterator examples + ```C++ // Iterate over all entries for (auto it : tree) { @@ -172,14 +175,16 @@ for (auto it = tree.begin_knn_query(5, {1, 1, 1}); it != tree.end(); ++it) { } ``` - + ##### Filters -All queries allow specifying an additional filter. The filter is called for every key/value pair that would -normally be returned (subject to query constraints) and to every node in the tree that the query decides to -traverse (also subject to query constraints). Returning `true` in the filter does not change query behaviour, -returning `false` means that the current value or child node is not returned or traversed. -An example of a geometric filter can be found in `phtree/common/filter.h` in `FilterAABB`. + +All queries allow specifying an additional filter. The filter is called for every key/value pair that would normally be +returned (subject to query constraints) and to every node in the tree that the query decides to traverse (also subject +to query constraints). Returning `true` in the filter does not change query behaviour, returning `false` means that the +current value or child node is not returned or traversed. An example of a geometric filter can be found +in `phtree/common/filter.h` in `FilterAABB`. + ```C++ template struct FilterByValueId { @@ -200,12 +205,13 @@ for (auto it = tree.begin_query({1, 1, 1}, {3, 3, 3}, FilterByValueId<3, T>())); } ``` - + ##### Distance function -Nearest neighbor queries can also use custom distance metrics, such as L1 distance. -Note that this returns a special iterator that provides a function to get the distance of the -current entry: + +Nearest neighbor queries can also use custom distance metrics, such as L1 distance. Note that this returns a special +iterator that provides a function to get the distance of the current entry: + ```C++ #include "phtree/phtree.h" @@ -217,20 +223,21 @@ for (auto it = tree.begin_knn_query(5, {1, 1, 1}, DistanceL1<3>())); it != tree. ``` - + #### Converters -The PH-Tree can internally only process integer keys. In order to use floating point coordinates, the floating point -coordinates must be converted to integer coordinates. The `PhTreeD` and `PhTreeBoxD` use by default the -`PreprocessIEEE` & `PostProcessIEEE` functions. The `IEEE` processor is a loss-less converter (in terms of numeric -precision) that simply takes the 64bits of a double value and treats them as if they were a 64bit integer -(it is slightly more complicated than that, see discussion in the papers referenced above). -In other words, it treats the IEEE 754 representation of the double value as integer, hence the name `IEEE` converter. - -The `IEEE` conversion is fast and reversible without loss of precision. However, it has been shown that other -converters can result in indexes that are up to 20% faster. -One useful alternative is a `Multiply` converter that convert floating point to integer by multiplication -and casting: + +The PH-Tree can internally only process integer keys. In order to use floating point coordinates, the floating point +coordinates must be converted to integer coordinates. The `PhTreeD` and `PhTreeBoxD` use by default the +`PreprocessIEEE` & `PostProcessIEEE` functions. The `IEEE` processor is a loss-less converter (in terms of numeric +precision) that simply takes the 64bits of a double value and treats them as if they were a 64bit integer +(it is slightly more complicated than that, see discussion in the papers referenced above). In other words, it treats +the IEEE 754 representation of the double value as integer, hence the name `IEEE` converter. + +The `IEEE` conversion is fast and reversible without loss of precision. However, it has been shown that other converters +can result in indexes that are up to 20% faster. One useful alternative is a `Multiply` converter that convert floating +point to integer by multiplication and casting: + ```C++ double my_float = ...; // Convert to int @@ -239,11 +246,11 @@ std::int64_t my_int = (std::int64_t) my_float * 1000000.; // Convert back double resultung_float = ((double)my_int) / 1000000.; ``` -It is obvious that this approach leads to a loss of numerical precision. Moreover, the loss of precision depends -on the actual range of the double values and the constant. -The chosen constant should probably be as large as possible but small enough such that converted -values do not exceed the 64bit limit of `std::int64_t`. -Note that the PH-Tree provides several `ConverterMultiply` implementations for point/box and double/float. + +It is obvious that this approach leads to a loss of numerical precision. Moreover, the loss of precision depends on the +actual range of the double values and the constant. The chosen constant should probably be as large as possible but +small enough such that converted values do not exceed the 64bit limit of `std::int64_t`. Note that the PH-Tree provides +several `ConverterMultiply` implementations for point/box and double/float. ```C++ template @@ -285,15 +292,16 @@ void test() { } ``` -It is also worth trying out constants that are 1 or 2 orders of magnitude smaller or larger than this maximum value. -Experience shows that this may affect query performance by up to 10%. This is due to a more compact structure - of the resulting index tree. +It is also worth trying out constants that are 1 or 2 orders of magnitude smaller or larger than this maximum value. +Experience shows that this may affect query performance by up to 10%. This is due to a more compact structure of the +resulting index tree. - + ##### Custom key types + With custom converters it is also possible to use your own custom classes as keys (instead of `PhPointD` or `PhBoxF`). -The following example defined custom `MyPoint` and `MyBox` types and a converter that allows using them with a `PhTree`: +The following example defined custom `MyPoint` and `MyBox` types and a converter that allows using them with a `PhTree`: ```c++ struct MyPoint { @@ -339,86 +347,86 @@ void test() { } ``` - - + #### Restrictions * **C++**: Supports value types of `T` and `T*`, but not `T&` -* **C++**: Return types of `find()`, `emplace()`, ... differ slightly from `std::map`, they have function `first()`, `second()` instead of fields of the same name. -* **General**: PH-Trees are **maps**, i.e. each coordinate can hold only *one* entry. In order to hold multiple values per coordinate - please use the `PhTreeMultiMap` implementations. -* **General**: PH-Trees order entries internally in z-order (Morton order). However, the order is based on the (unsigned) bit representation of keys, so negative coordinates are returned *after* positive coordinates. +* **C++**: Return types of `find()`, `emplace()`, ... differ slightly from `std::map`, they have function `first()` + , `second()` instead of fields of the same name. +* **General**: PH-Trees are **maps**, i.e. each coordinate can hold only *one* entry. In order to hold multiple values + per coordinate please use the `PhTreeMultiMap` implementations. +* **General**: PH-Trees order entries internally in z-order (Morton order). However, the order is based on the ( + unsigned) bit representation of keys, so negative coordinates are returned *after* positive coordinates. * **General**: The current implementation support between 2 and 63 dimensions. * **Differences to std::map**: There are several differences to `std::map`. Most notably for the iterators: - * `begin()`/`end()` are not comparable with `<` or `>`. Only `it == tree.end()` and `it != tree.end()` is supported. - * Value of `end()`: The tree has no linear memory layout, so there is no useful definition of a pointer pointing _after_ the last entry or any entry. This should be irrelevant for normal usage. - + * `begin()`/`end()` are not comparable with `<` or `>`. Only `it == tree.end()` and `it != tree.end()` is supported. + * Value of `end()`: The tree has no linear memory layout, so there is no useful definition of a pointer pointing _ + after_ the last entry or any entry. This should be irrelevant for normal usage. - + ### Troubleshooting / FAQ **Problem**: The PH-Tree appears to be losing updates/insertions. -**Solution**: Remember that the PH-Tree is a *map*, keys will not be inserted if an identical key already exists. -The easiest solution is to use one of the `PhTreeMultiMap` implementations. -Alternatively, this can be solved by turning the PH-Tree into a multi-map, for example by using something like `std::map` or `std::set` as member type: -`PhTree<3, std::set>`. The `set` instances can then be used to handle key conflicts by storing -multiple entries for the same key. The logic to handle conflicts must currently be implemented manually by the user. +**Solution**: Remember that the PH-Tree is a *map*, keys will not be inserted if an identical key already exists. The +easiest solution is to use one of the `PhTreeMultiMap` implementations. Alternatively, this can be solved by turning the +PH-Tree into a multi-map, for example by using something like `std::map` or `std::set` as member type: +`PhTree<3, std::set>`. The `set` instances can then be used to handle key conflicts by storing multiple +entries for the same key. The logic to handle conflicts must currently be implemented manually by the user. ---------------------------------- ## Performance - + ### When to use a PH-Tree -The PH-Tree is a multi-dimensional index or spatial index. This section gives a rough overview how the PH-Tree -compares to other spatial indexes, such as *k*D-trees, R-trees/BV-hierarchies or quadtrees. +The PH-Tree is a multi-dimensional index or spatial index. This section gives a rough overview how the PH-Tree compares +to other spatial indexes, such as *k*D-trees, R-trees/BV-hierarchies or quadtrees. -Disclaimer: This overview cannot be comprehensive (there are 100s of spatial indexes out there) and performance -depends heavily on the actual dataset, usage patterns, hardware, ... . +Disclaimer: This overview cannot be comprehensive (there are 100s of spatial indexes out there) and performance depends +heavily on the actual dataset, usage patterns, hardware, ... . **Generally, the PH-Tree tends to have the following advantages:** -* Fast insertion/removal times. While some indexes, such as *k*-D-trees, trees can be build from scratch very fast, -they tend to be be much slower when removing entries or when indexing large datasets. Also, most indexes require -rebalancing which may result in unpredictable latency (R-trees) or may result in index degradation if delayed -(*k*D-trees). +* Fast insertion/removal times. While some indexes, such as *k*-D-trees, trees can be build from scratch very fast, they + tend to be be much slower when removing entries or when indexing large datasets. Also, most indexes require + rebalancing which may result in unpredictable latency (R-trees) or may result in index degradation if delayed + (*k*D-trees). -* Competitive query performance. Query performance is generally comparable to other index structures. The PH-Tree -is fast at looking up coordinates but requires more traversal than other indexes. This means it is especially -efficient if the query results are 'small', e.g. up to 100 results per query. +* Competitive query performance. Query performance is generally comparable to other index structures. The PH-Tree is + fast at looking up coordinates but requires more traversal than other indexes. This means it is especially efficient + if the query results are 'small', e.g. up to 100 results per query. -* Scalability with large datasets. The PH-Tree's insert/remove/query performance tends to scale well to large -datasets with millions of entries. +* Scalability with large datasets. The PH-Tree's insert/remove/query performance tends to scale well to large datasets + with millions of entries. -* Scalability with the number of dimensions. The PH-Tree has been shown to deal "well" with high dimensional data (1000k+ -dimensions). What does "well" mean? - * It works very well for up to 30 (sometimes 50) dimensions. **Please note that the C++ implementation has not been - optimised nearly as much as the Java implementation.** - * For more dimensions (Java was tested with 1000+ dimensions) the PH-Tree still has excellent - insertion/deletion performance. However, the query performance cannot compete with specialised - high-dim indexes such as cover-trees or pyramid-trees (these tend to be *very slow* on insertion/deletion though). - -* Modification operations (insert/delete) in a PH-Tree are guaranteed to modify only one Node (potentially -creating/deleting a second one). This guarantee can have advantages for concurrent implementations or when -serializing the index. Please note that this advantage is somewhat theoretical because this guarantee is not exploited -by the current implementation (it doesn't support concurrency or serialization). +* Scalability with the number of dimensions. The PH-Tree has been shown to deal "well" with high dimensional data ( + 1000k+ dimensions). What does "well" mean? + * It works very well for up to 30 (sometimes 50) dimensions. **Please note that the C++ implementation has not been + optimised nearly as much as the Java implementation.** + * For more dimensions (Java was tested with 1000+ dimensions) the PH-Tree still has excellent insertion/deletion + performance. However, the query performance cannot compete with specialised high-dim indexes such as cover-trees + or pyramid-trees (these tend to be *very slow* on insertion/deletion though). +* Modification operations (insert/delete) in a PH-Tree are guaranteed to modify only one Node (potentially + creating/deleting a second one). This guarantee can have advantages for concurrent implementations or when serializing + the index. Please note that this advantage is somewhat theoretical because this guarantee is not exploited by the + current implementation (it doesn't support concurrency or serialization). **PH-Tree disadvantages:** * A PH-Tree is a *map*, not a *multi-map*. This project also provides `PhTreeMultiMap` implementations that store a -hash-set at each coordinate. -In practice, the overhead of storing sets appears to be usually small enough to not matter much. + hash-set at each coordinate. In practice, the overhead of storing sets appears to be usually small enough to not + matter much. -* PH-Trees are not very efficient in scenarios where queries tend to return large result sets in the order of 1000 or more. +* PH-Trees are not very efficient in scenarios where queries tend to return large result sets in the order of 1000 or + more. - - + ### Optimising Performance @@ -426,81 +434,78 @@ There are numerous ways to improve performance. The following list gives an over 1) **Use `for_each` instead of iterators**. This should improve performance of queries by 5%-10%. -2) **Use `emplace_hint` if possible**. When updating the position of an entry, the naive way is to use `erase()`/`emplace()`. - With `emplace_hint`, insertion can avoid navigation to the target node if the insertion coordinate is close to the - removal coordinate. +2) **Use `emplace_hint` if possible**. When updating the position of an entry, the naive way is to use `erase()` + /`emplace()`. With `emplace_hint`, insertion can avoid navigation to the target node if the insertion coordinate is + close to the removal coordinate. ```c++ auto iter = tree.find(old_position); tree.erase(iter); tree.emplace_hint(iter, new_position, value); ``` -3) **Store pointers instead of large data objects**. For example, use `PhTree<3, MyLargeClass*>` instead of -`PhTree<3, MyLargeClass>` if `MyLargeClass` is large. - * This prevents the PH-Tree from storing the values inside the tree. This should improve cache-locality - and thus performance when operating on the tree. - * Using pointers is also useful if construction/destruction of values is expensive. The reason is that - the tree has to construct and destruct objects internally. This may be avoidable but is currently still happening. - -4) **Use non-box query shapes**. Depending on the use case it may be more suitable to use a custom filter for queries. -For example: - - `tree.for_each(callback, FilterSphere(center, radius, tree.converter()));` - -5) **Use a different data converter**. The default converter of the PH-Tree results in a reasonably fast index. -Its biggest advantage is that it provides lossless conversion from floating point coordinates to PH-Tree coordinates -(integers) and back to floating point coordinates. - * The `ConverterMultiply` is a lossy converter but it tends to improve performance by 10% or more. This is not caused - by faster operation in the converter itself but by a more compact tree shape. The example shows how to use a converter - that multiplies coordinates by 100'000, thus preserving roughly 5 fractional digits: - - `PhTreeD>` - -6) **Use custom key types**. By default, the PH-Tree accepts only coordinates in the form of its own key types, such - as `PhPointD`, `PhBoxF` or similar. To avoid conversion from custom types to PH-Tree key types, custom classes - can often be adapted to be accepted directly by the PH-Tree without conversion. This requires implementing a - custom converter as described in the section about [Custom Key Types](#custom-key-types). - -7) Advanced: **Adapt internal Node representation**. Depending on the dimensionality `DIM`, the PH-Tree uses internally in -`Nodes` different container types to hold entries. By default, it uses an array for `DIM<=3`, a vector -for `DIM<=8` and an ordered map for `DIM>8`. Adapting these thresholds can have strong effects on performance as well as -memory usage. -One example: Changing the threshold to use vector for `DIM==3` reduced performance of the `update_d` benchmark by 40%-50% but -improved performance of `query_d` by 15%-20%. The threshold is currently hardcoded. -The effects are not always easy to predict but here are some guidelines: - * "array" is the fastest solution for insert/update/remove type operations. Query performance is "ok". Memory consumption is - **O(DIM^2)** for every node regardless of number of entries in the node. - * "vector" is the fastest for queries but has for large nodes **worst case O(DIM^2)** insert/update/remove performance. - * "map" scales well with `DIM` but is for low values of `DIM` generally slower than "array" or "vector". - +3) **Store pointers instead of large data objects**. For example, use `PhTree<3, MyLargeClass*>` instead of + `PhTree<3, MyLargeClass>` if `MyLargeClass` is large. + * This prevents the PH-Tree from storing the values inside the tree. This should improve cache-locality and thus + performance when operating on the tree. + * Using pointers is also useful if construction/destruction of values is expensive. The reason is that the tree has + to construct and destruct objects internally. This may be avoidable but is currently still happening. + +4) **Use non-box query shapes**. Depending on the use case it may be more suitable to use a custom filter for queries. + For example: + + `tree.for_each(callback, FilterSphere(center, radius, tree.converter()));` + +5) **Use a different data converter**. The default converter of the PH-Tree results in a reasonably fast index. Its + biggest advantage is that it provides lossless conversion from floating point coordinates to PH-Tree coordinates + (integers) and back to floating point coordinates. + * The `ConverterMultiply` is a lossy converter but it tends to improve performance by 10% or more. This is not + caused by faster operation in the converter itself but by a more compact tree shape. The example shows how to use + a converter that multiplies coordinates by 100'000, thus preserving roughly 5 fractional digits: + + `PhTreeD>` + +6) **Use custom key types**. By default, the PH-Tree accepts only coordinates in the form of its own key types, such + as `PhPointD`, `PhBoxF` or similar. To avoid conversion from custom types to PH-Tree key types, custom classes can + often be adapted to be accepted directly by the PH-Tree without conversion. This requires implementing a custom + converter as described in the section about [Custom Key Types](#custom-key-types). + +7) Advanced: **Adapt internal Node representation**. Depending on the dimensionality `DIM`, the PH-Tree uses internally + in + `Nodes` different container types to hold entries. By default, it uses an array for `DIM<=3`, a vector for `DIM<=8` + and an ordered map for `DIM>8`. Adapting these thresholds can have strong effects on performance as well as memory + usage. One example: Changing the threshold to use vector for `DIM==3` reduced performance of the `update_d` benchmark + by 40%-50% but improved performance of `query_d` by 15%-20%. The threshold is currently hardcoded. + The effects are not always easy to predict but here are some guidelines: + * "array" is the fastest solution for insert/update/remove type operations. Query performance is "ok". Memory + consumption is + **O(DIM^2)** for every node regardless of number of entries in the node. + * "vector" is the fastest for queries but has for large nodes **worst case O(DIM^2)** insert/update/remove + performance. + * "map" scales well with `DIM` but is for low values of `DIM` generally slower than "array" or "vector". ---------------------------------- - ## Compiling the PH-Tree -This section will guide you through the initial build system and IDE you need to go through in order to build and run custom versions of the PH-Tree on your machine. +This section will guide you through the initial build system and IDE you need to go through in order to build and run +custom versions of the PH-Tree on your machine. - + ### Build system & dependencies -PH-Tree can be built with *cmake 3.14* or [Bazel](https://bazel.build) as build system. All code is written in C++ targeting the C++17 standard. -The code has been verified to compile with Clang 9 on Linux and Visual Studio 2019 on Windows. +PH-Tree can be built with *cmake 3.14* or [Bazel](https://bazel.build) as build system. All code is written in C++ +targeting the C++17 standard. The code has been verified to compile on Linux with Clang 9, 10, 11, 12, and GCC 9, 10, +11, and on Windows with Visual Studio 2019. #### Ubuntu Linux -Installing clang & bazel: -``` -echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list -curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - -curl https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add - -sudo apt-add-repository 'deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main' -sudo apt-get update -sudo apt-get install clang-9 bazel -``` +* Installing [clang](https://apt.llvm.org/) + +* Installing [bazel](https://docs.bazel.build/versions/main/install-ubuntu.html) + +* To install [cmake](https://launchpad.net/~hnakamur/+archive/ubuntu/cmake): -To install [*cmake*](https://launchpad.net/~hnakamur/+archive/ubuntu/cmake): ``` sudo add-apt-repository ppa:hnakamur/libarchive sudo add-apt-repository ppa:hnakamur/libzstd @@ -511,26 +516,30 @@ sudo apt install cmake #### Windows -To build on Windows, you'll need to have a version of Visual Studio 2019 installed (likely Professional), in addition to the latest version of -[Bazel](https://docs.bazel.build/versions/master/windows.html). - +To build on Windows, you'll need to have a version of Visual Studio 2019 installed (likely Professional), in addition to +[Bazel](https://docs.bazel.build/versions/master/windows.html) or +[cmake](https://cmake.org/download/). - + ### Bazel + Once you have set up your dependencies, you should be able to build the PH-Tree repository by running: + ``` bazel build ... ``` Similarly, you can run all unit tests with: + ``` bazel test ... ``` - + ### cmake + ``` mkdir build cd build @@ -539,14 +548,13 @@ cmake --build . ./example/Example ``` - ## Further Resources - + ### Theory -The PH-Tree is discussed in the following publications and reports: +The PH-Tree is discussed in the following publications and reports: - T. Zaeschke, C. Zimmerli, M.C. Norrie: "The PH-Tree -- A Space-Efficient Storage Structure and Multi-Dimensional Index", (SIGMOD 2014)