Refactor grid to measure cache-awareness impact on grid tracing #251

glpuga · 2023-07-29T17:37:56Z

Proposed changes

Refactors the grid to avoid low-level access using indexes that assume a memory layout.
Extracts the grid storage into it's own thing, the LinearGridStorage class.
Creates multiple cache-friendly storage alternatives.
Created a number of benchmarks to evaluate them.
Minor fixes and changes.
Also adds an alias to the docker to run the pre-commits from any folder.

Note: For review of the changes it's probably better to do a first walk-around commit by commit before looking at the changes as a whole.

Type of change

🐛 Bugfix (change which fixes an issue)
🚀 Feature (change which adds functionality)
📚 Documentation (change which fixes or extends documentation)

Checklist

Put an x in the boxes that apply. This is simply a reminder of what we will require before merging your code.

Lint and unit tests (if any) pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
All commits have been signed for DCO

Additional comments

The summary after the usefulness of cache friendly when raytracing is that it does a lot of difference, but only for long rays (over 100m). For shorter rays, the ratio between the linear storage and the best cache-aware storage inverts.

These are the measurements taken by:

tracing from a central point
182 different bearings from that central point equally dividing the 360 degrees, sequentially visited in clock-wise sequence (similar to actual processing of scan data).
Obstacles at 20, 50, 100 or 200 meters.

hidmic · 2023-09-05T13:06:48Z

How do you suggest we go about this @glpuga? Are cache friendly storage layouts pluggable like other features in the library?

glpuga · 2023-09-07T21:22:16Z

How do you suggest we go about this @glpuga?

I think the changes are worth merging, if only because the relation between the likelihood sensor model and the grid is a bit more clearcut and low-level accesses to the grid were removed.

Performance-wise, there's no change. Benchmarks seem to indicate that you'd only see meaningful differences with maps much larger than the one we currently use to measure performance.

Are cache friendly storage layouts pluggable like other features in the library?

They are. To switch between them it's possible to change the storage type parameter of the occupany grid storage mixin.

https://github.com/Ekumen-OS/beluga/pull/251/files#diff-913a8969931817e56d8350462bd5ebba435ef2fde37ec83ee653ba483c05dc4bR51

hidmic

Damn, that was a long PR 😅 Awesome work @glpuga!

One thought: we are focusing on the effects of data fetching on CPU caches. I wonder what is going on with code fetching. I would expect near zero vtable lookups helping us there.

beluga/include/beluga/sensor/data/cache_friendly_grid_storage.hpp

hidmic · 2023-09-07T23:18:26Z

beluga/include/beluga/sensor/data/value_grid2_mixin.hpp

+  template <typename... Args>
+  explicit ValueGrid2Mixin(std::vector<T> data, std::size_t width, double resolution, Args&&... args)
+      : Mixin(std::forward<Args>(args)...),
+        data_(std::move(data)),


@glpuga meta: shouldn't this grid take a storage as well?

beluga/test/beluga/algorithm/test_distance_map.cpp

hidmic · 2023-09-07T23:27:02Z

beluga/include/beluga/sensor/data/occupancy_grid2_mixin.hpp

+      const auto xi = static_cast<int>(raster_index % width);
+      const auto yi = static_cast<int>(raster_index / width);
+      const auto cell = Eigen::Vector2i(xi, yi);
+      return std::make_tuple(cell, this->self().data_at(cell).value());


@glpuga meta: this assumes the underlying grid is at least dense (by rasterizing w/o checking for data availability). FWIW one of the reasons why at the time I refrained from using mixins for the grid hierarchy are these dependencies. They aren't really independent (although I like that we split dense and regular, that coupling was unnecessary).

beluga/test/beluga/sensor/data/test_cache_friendly_grid_storage.cpp

hidmic · 2023-09-07T23:47:38Z

beluga_amcl/src/amcl_node_utils.cpp

+  ROSOccupancyGrid::MapStorage grid_storage(ros_msg_ptr->info.width, ros_msg_ptr->info.height);
+  for (std::size_t x = 0; x < ros_msg_ptr->info.width; ++x) {
+    for (std::size_t y = 0; y < ros_msg_ptr->info.height; ++y) {
+      const auto index = x + y * ros_msg_ptr->info.width;
+      grid_storage.cell(static_cast<int>(x), static_cast<int>(y)) = ros_msg_ptr->data[index];
+    }
+  }


@glpuga meta: one thing I don't love about this is that now we unconditionally copy the entire grid. That's OK if grids never update and performance improves. It's not that cool for large grids that change frequently or if there is no cache to leverage and memory is scarce (thinking of microcontrollers). Having a storage that maps an existing memory layout to our grids APIs would be nice. FWIW that's another reason why the grid hierarchy is mostly behavior with no state.

I see the point, and nothing prevents us from creating a storage that just maps accesses to an existing buffer like we did before if needed, except lack of generality in the code above to leave initialization out of the grid storage scope to isolate it from ROS types.

I'm not too concerned about that kind of systems constrains (low memory + lack of cache) in the beluga_amcl code above, though; any system that would finds itself under those constraints is unlikely to run ROS in the first place.

any system that would finds itself under those constraints is unlikely to run ROS in the first place.

Definitely. As long as we don't propagate the constraint to the core library, I'm good :)

glpuga · 2023-09-08T15:34:33Z

Thanks for the review @hidmic I'll try to address your comments soon.

…rithms Signed-off-by: Gerardo Puga <[email protected]>

Signed-off-by: Gerardo Puga <[email protected]>

…ccupancyGrid Signed-off-by: Gerardo Puga <[email protected]>

Signed-off-by: Gerardo Puga <[email protected]>

hidmic

Looking great! CI seems somewhat unhappy though.

hidmic · 2023-09-10T23:58:07Z

beluga/include/beluga/sensor/data/cache_friendly_grid_storage.hpp

+    // pre-initialize the grid with default values
+    for (std::size_t index = 0; index < buffer_size_; ++index) {
+      storage_[index] = T{};
+    }


@glpuga nit: this shouldn't be necessary, see new expression for array types (in particular, the section on construction). If in doubt, we can always tuck a {} at the end ie. new T[buffer_size_]{}. Same elsewhere.

hidmic · 2023-09-10T23:58:47Z

beluga/include/beluga/sensor/data/cache_friendly_grid_storage.hpp

+    // G G G G H H H H I I ...
+    // J J J J K K K K L L ...
+
+    if constexpr (true || !is_a_square_power_of_two(LineLength / sizeof(T))) {


@glpuga hmm, that true || seems like a debugging leftover.

hidmic · 2024-02-25T14:48:20Z

@glpuga why close this? It was a good addition.

glpuga · 2024-02-26T12:51:12Z

@glpuga why close this? It was a good addition.

Maybe, but it was not adding new functional features, and it's been stalled for months. Almost every file in the MR was in conflict with the current main due to files moving to new locations and other changes, so the amount of refactor needed just to bring this back is considerable.

I don't have the time to revive this right now, and having an 8 months old PR pinned in the repo is not really good optics so I'd rather close it.

glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 519c6a3 to 5bf6d77 Compare July 29, 2023 20:34

glpuga changed the title ~~Glpuga/cache friendlyness benchmark~~ Grid refactor to measure cache-awareness impact on grid tracing Jul 29, 2023

glpuga changed the title ~~Grid refactor to measure cache-awareness impact on grid tracing~~ Refactor grid to measure cache-awareness impact on grid tracing Jul 29, 2023

glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 5bf6d77 to 94e583e Compare July 29, 2023 22:37

glpuga marked this pull request as ready for review July 29, 2023 23:04

glpuga requested a review from nahueespinosa August 8, 2023 14:46

hidmic reviewed Sep 7, 2023

View reviewed changes

glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 94e583e to dcceed6 Compare September 9, 2023 21:38

glpuga added 19 commits September 10, 2023 13:34

Improvement: Add benchmark to measure effect of cache on tracing algo…

bb9bc01

…rithms Signed-off-by: Gerardo Puga <[email protected]>

Pivot: StaticOcuppancyGrid to using the heap to allow larger maps

eeba91e

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: Enlarge the map size in the cache benchmark to improve SNR

3ca0e32

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: reserve vector size to avoid reallocations

ff8002a

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: remove redundant function that assumes an index exist

4a07786

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: remove notion of indexing from the dense grid

c5c3671

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: neighborhood4 should assume any particular memory arrangement

3ca3b73

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: reduce low-level calls to data() outside of the grid

46dee1a

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: reduce clutter in lambda capture block

626cf2e

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: Add alias to run precommits

9de79da

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Convert grid hierarchy into mixins

af5f9b0

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: Avoid creating full SE2 if we only need the translation

dcfa429

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: remove low level data() usages

cfb7a53

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Remove now-useless low level cell access by index

0ee9e00

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Remove LinearGrid, indexes are the sole concern of storage

ddb6995

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Improve grid storage for StaticOccupancyGrid

8852d7d

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: Minor format fixes

8e2430c

Signed-off-by: Gerardo Puga <[email protected]>

Improvement: Removed leftover commented code

6bfc9d9

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Extract everything that's reusable from StaticOccupancyGrid

61625f3

Signed-off-by: Gerardo Puga <[email protected]>

glpuga added 9 commits September 10, 2023 13:34

Pivot: Remove BaselineGrid since there's no perf diference to StaticO…

a882c9d

…ccupancyGrid Signed-off-by: Gerardo Puga <[email protected]>

Rename PlainGridStorage to LinearGridStorage

db226b8

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Remove direct uses of LinearGridStorage

a0cfa30

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Add new cache friendly storage buffer

0d947f3

Signed-off-by: Gerardo Puga <[email protected]>

Impromevent: Integrate the ROS occupancy grid with the ones in Beluga

7d9be1a

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Fixups after rebase

4afc8f6

Signed-off-by: Gerardo Puga <[email protected]>

Pivot: Beautify and document

92089db

Signed-off-by: Gerardo Puga <[email protected]>

Fix clang issues

62bda56

Signed-off-by: Gerardo Puga <[email protected]>

Address review comments

b377baa

Signed-off-by: Gerardo Puga <[email protected]>

glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 06cc1fd to b377baa Compare September 10, 2023 16:35

hidmic reviewed Sep 11, 2023

View reviewed changes

glpuga closed this Feb 25, 2024

nahueespinosa deleted the glpuga/cache_friendlyness_benchmark branch June 22, 2024 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor grid to measure cache-awareness impact on grid tracing #251

Refactor grid to measure cache-awareness impact on grid tracing #251

glpuga commented Jul 29, 2023 •

edited

Loading

hidmic commented Sep 5, 2023

glpuga commented Sep 7, 2023

hidmic left a comment

hidmic Sep 7, 2023

hidmic Sep 7, 2023 •

edited

Loading

hidmic Sep 7, 2023

glpuga Sep 10, 2023 •

edited

Loading

hidmic Sep 10, 2023

glpuga commented Sep 8, 2023

hidmic left a comment

hidmic Sep 10, 2023

hidmic Sep 10, 2023

hidmic commented Feb 25, 2024

glpuga commented Feb 26, 2024

Refactor grid to measure cache-awareness impact on grid tracing #251

Refactor grid to measure cache-awareness impact on grid tracing #251

Conversation

glpuga commented Jul 29, 2023 • edited Loading

Proposed changes

Type of change

Checklist

Additional comments

hidmic commented Sep 5, 2023

glpuga commented Sep 7, 2023

hidmic left a comment

Choose a reason for hiding this comment

hidmic Sep 7, 2023

Choose a reason for hiding this comment

hidmic Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

hidmic Sep 7, 2023

Choose a reason for hiding this comment

glpuga Sep 10, 2023 • edited Loading

Choose a reason for hiding this comment

hidmic Sep 10, 2023

Choose a reason for hiding this comment

glpuga commented Sep 8, 2023

hidmic left a comment

Choose a reason for hiding this comment

hidmic Sep 10, 2023

Choose a reason for hiding this comment

hidmic Sep 10, 2023

Choose a reason for hiding this comment

hidmic commented Feb 25, 2024

glpuga commented Feb 26, 2024

glpuga commented Jul 29, 2023 •

edited

Loading

hidmic Sep 7, 2023 •

edited

Loading

glpuga Sep 10, 2023 •

edited

Loading