Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor grid to measure cache-awareness impact on grid tracing #251

Closed
wants to merge 28 commits into from

Conversation

glpuga
Copy link
Collaborator

@glpuga glpuga commented Jul 29, 2023

Proposed changes

  • Refactors the grid to avoid low-level access using indexes that assume a memory layout.
  • Extracts the grid storage into it's own thing, the LinearGridStorage class.
  • Creates multiple cache-friendly storage alternatives.
  • Created a number of benchmarks to evaluate them.
  • Minor fixes and changes.
  • Also adds an alias to the docker to run the pre-commits from any folder.

Note: For review of the changes it's probably better to do a first walk-around commit by commit before looking at the changes as a whole.

Type of change

  • 🐛 Bugfix (change which fixes an issue)
  • 🚀 Feature (change which adds functionality)
  • 📚 Documentation (change which fixes or extends documentation)

Checklist

Put an x in the boxes that apply. This is simply a reminder of what we will require before merging your code.

  • Lint and unit tests (if any) pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • All commits have been signed for DCO

Additional comments

The summary after the usefulness of cache friendly when raytracing is that it does a lot of difference, but only for long rays (over 100m). For shorter rays, the ratio between the linear storage and the best cache-aware storage inverts.

These are the measurements taken by:

  • tracing from a central point
  • 182 different bearings from that central point equally dividing the 360 degrees, sequentially visited in clock-wise sequence (similar to actual processing of scan data).
  • Obstacles at 20, 50, 100 or 200 meters.

Screenshot from 2023-07-29 14-29-05

@glpuga glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 519c6a3 to 5bf6d77 Compare July 29, 2023 20:34
@glpuga glpuga changed the title Glpuga/cache friendlyness benchmark Grid refactor to measure cache-awareness impact on grid tracing Jul 29, 2023
@glpuga glpuga changed the title Grid refactor to measure cache-awareness impact on grid tracing Refactor grid to measure cache-awareness impact on grid tracing Jul 29, 2023
@glpuga glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 5bf6d77 to 94e583e Compare July 29, 2023 22:37
@glpuga glpuga marked this pull request as ready for review July 29, 2023 23:04
@hidmic
Copy link
Collaborator

hidmic commented Sep 5, 2023

How do you suggest we go about this @glpuga? Are cache friendly storage layouts pluggable like other features in the library?

@glpuga
Copy link
Collaborator Author

glpuga commented Sep 7, 2023

How do you suggest we go about this @glpuga?

I think the changes are worth merging, if only because the relation between the likelihood sensor model and the grid is a bit more clearcut and low-level accesses to the grid were removed.

Performance-wise, there's no change. Benchmarks seem to indicate that you'd only see meaningful differences with maps much larger than the one we currently use to measure performance.

Are cache friendly storage layouts pluggable like other features in the library?

They are. To switch between them it's possible to change the storage type parameter of the occupany grid storage mixin.

https://github.com/Ekumen-OS/beluga/pull/251/files#diff-913a8969931817e56d8350462bd5ebba435ef2fde37ec83ee653ba483c05dc4bR51

Copy link
Collaborator

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, that was a long PR 😅 Awesome work @glpuga!

One thought: we are focusing on the effects of data fetching on CPU caches. I wonder what is going on with code fetching. I would expect near zero vtable lookups helping us there.

template <typename... Args>
explicit ValueGrid2Mixin(std::vector<T> data, std::size_t width, double resolution, Args&&... args)
: Mixin(std::forward<Args>(args)...),
data_(std::move(data)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glpuga meta: shouldn't this grid take a storage as well?

beluga/test/beluga/algorithm/test_distance_map.cpp Outdated Show resolved Hide resolved
const auto xi = static_cast<int>(raster_index % width);
const auto yi = static_cast<int>(raster_index / width);
const auto cell = Eigen::Vector2i(xi, yi);
return std::make_tuple(cell, this->self().data_at(cell).value());
Copy link
Collaborator

@hidmic hidmic Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glpuga meta: this assumes the underlying grid is at least dense (by rasterizing w/o checking for data availability). FWIW one of the reasons why at the time I refrained from using mixins for the grid hierarchy are these dependencies. They aren't really independent (although I like that we split dense and regular, that coupling was unnecessary).

Comment on lines +66 to +72
ROSOccupancyGrid::MapStorage grid_storage(ros_msg_ptr->info.width, ros_msg_ptr->info.height);
for (std::size_t x = 0; x < ros_msg_ptr->info.width; ++x) {
for (std::size_t y = 0; y < ros_msg_ptr->info.height; ++y) {
const auto index = x + y * ros_msg_ptr->info.width;
grid_storage.cell(static_cast<int>(x), static_cast<int>(y)) = ros_msg_ptr->data[index];
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glpuga meta: one thing I don't love about this is that now we unconditionally copy the entire grid. That's OK if grids never update and performance improves. It's not that cool for large grids that change frequently or if there is no cache to leverage and memory is scarce (thinking of microcontrollers). Having a storage that maps an existing memory layout to our grids APIs would be nice. FWIW that's another reason why the grid hierarchy is mostly behavior with no state.

Copy link
Collaborator Author

@glpuga glpuga Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point, and nothing prevents us from creating a storage that just maps accesses to an existing buffer like we did before if needed, except lack of generality in the code above to leave initialization out of the grid storage scope to isolate it from ROS types.

I'm not too concerned about that kind of systems constrains (low memory + lack of cache) in the beluga_amcl code above, though; any system that would finds itself under those constraints is unlikely to run ROS in the first place.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any system that would finds itself under those constraints is unlikely to run ROS in the first place.

Definitely. As long as we don't propagate the constraint to the core library, I'm good :)

@glpuga
Copy link
Collaborator Author

glpuga commented Sep 8, 2023

Thanks for the review @hidmic I'll try to address your comments soon.

@glpuga glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 94e583e to dcceed6 Compare September 9, 2023 21:38
@glpuga glpuga force-pushed the glpuga/cache_friendlyness_benchmark branch from 06cc1fd to b377baa Compare September 10, 2023 16:35
Copy link
Collaborator

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! CI seems somewhat unhappy though.

Comment on lines +49 to +52
// pre-initialize the grid with default values
for (std::size_t index = 0; index < buffer_size_; ++index) {
storage_[index] = T{};
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glpuga nit: this shouldn't be necessary, see new expression for array types (in particular, the section on construction). If in doubt, we can always tuck a {} at the end ie. new T[buffer_size_]{}. Same elsewhere.

// G G G G H H H H I I ...
// J J J J K K K K L L ...

if constexpr (true || !is_a_square_power_of_two(LineLength / sizeof(T))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glpuga hmm, that true || seems like a debugging leftover.

@glpuga glpuga closed this Feb 25, 2024
@hidmic
Copy link
Collaborator

hidmic commented Feb 25, 2024

@glpuga why close this? It was a good addition.

@glpuga
Copy link
Collaborator Author

glpuga commented Feb 26, 2024

@glpuga why close this? It was a good addition.

Maybe, but it was not adding new functional features, and it's been stalled for months. Almost every file in the MR was in conflict with the current main due to files moving to new locations and other changes, so the amount of refactor needed just to bring this back is considerable.

I don't have the time to revive this right now, and having an 8 months old PR pinned in the repo is not really good optics so I'd rather close it.

@nahueespinosa nahueespinosa deleted the glpuga/cache_friendlyness_benchmark branch June 22, 2024 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants