-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added language standard parallelism to dot product #251
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! : - ) Please see comments; thanks!
@@ -13,7 +13,7 @@ | |||
// Make mdspan less verbose | |||
using std::experimental::mdspan; | |||
using std::experimental::extents; | |||
using std::experimental::dynamic_extent; | |||
using std::dynamic_extent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change suggests that CI doesn't actually build the examples. Should we consider fixing that?
This change might actually need to depend on the C++ version, as std::dynamic_extent
entered the Standard in C++20 with span
. We'll have to revisit the examples anyway, because of the recent change to use macros to specify the namespaces. (The macros let users control them, so that they can but don't need to go into std
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI builds the examples; I just wasn't building them on my PC.
I was unaware of the macros change. Can you give me an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDSPAN_IMPL_STANDARD_NAMESPACE
and MDSPAN_IMPL_PROPOSED_NAMESPACE
are the two macros. They can be defined by users, but they also get default definitions, e.g., here in include/experimental/mdspan
. The library assumes that MDSPAN_IMPL_PROPOSED_NAMESPACE
is nested inside MDSPAN_IMPL_STANDARD_NAMESPACE
.
Using these macros might require updating the version of the reference mdspan implementation that github's CI pulls in.
using std::experimental::submdspan; | ||
using std::experimental::full_extent; | ||
using std::full_extent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see above comment on std::full_extent
; thanks!
@@ -43,6 +43,7 @@ | |||
#ifndef LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_ | |||
#define LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_ | |||
|
|||
#include <ranges> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<ranges>
is a C++20 include. Would you consider protecting both the include and the use of iota_view
below with the appropriate feature test macro, and providing a fall-back implementation? It's OK if the fall-back is not parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Can you point me to an example of using a feature test macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! This web page gives a good summary.
// Ensure that the feature test macro __cpp_lib_ranges is available;
// <version> also defines this macro, but that is a C++20 header.
#include <algorithm>
#if defined(__cpp_lib_ranges)
# include <ranges>
#endif
void some_function() {
#if defined(__cpp_lib_ranges_iota)
// ... code using views::iota ...
#else
// ... fall-back code ...
#endif
}
The point of using two different macros -- one for the header, and one for the specific feature iota::view
-- is that the feature came after the header, so some compiler versions may have the header but not the feature.
init += v1(k) * v2(k); | ||
} | ||
return init; | ||
using scalar_type = std::common_type_t<ElementType1, ElementType2, Scalar>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary the right type. For example, if operator*
returns a higher-precision type than its inputs (as might make sense for custom integer or fixed-point real types, for example), then what you want here is the common type of Scalar
and the result of operator*
. However, it turns out that you don't need scalar_type
here; please see the comment below.
@@ -12,11 +12,12 @@ inline namespace __p1673_version_0 { | |||
namespace linalg { | |||
namespace impl { | |||
// the execution policy used for default serial inline implementations | |||
struct inline_exec_t {}; | |||
using inline_exec_t = std::execution::sequenced_policy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see note above about the "inline" executor.
Any Standard Algorithm overload that takes an ExecutionPolicy
template parameter is a "parallel algorithm" (see [algorithms.parallel] 2). That changes what the algorithm assumes about the behavior of element access functions.
Therefore, it's really important that the "inline" executor not use any parallel algorithm, even if ExecutionPolicy
is sequenced_policy
.
|
||
// The execution policy used when no execution policy is provided | ||
// It must be remapped to some other execution policy, which the default mapper does | ||
struct default_exec_t {}; | ||
using default_exec_t = std::execution::parallel_policy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It must be remapped to some other execution policy, which the default mapper does
-- it's important that we not circumvent the default mapper. Would you consider, thus, reverting this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, the execpolicy_mapper did not like this default. I can restore it and show you the error it gave me.
@@ -87,7 +87,7 @@ TEST_F(blas2_signed_float_fixture, kokkos_matrix_frob_norm_trivial_empty) | |||
{ | |||
std::vector<value_type> v; | |||
|
|||
constexpr auto de = std::experimental::dynamic_extent; | |||
constexpr auto de = std::dynamic_extent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CI not actually test this test? (Please see comment above about std::dynamic_extent
.) If so, then should we consider using the namespace macros instead of std
explicitly?
#ifdef LINALG_ENABLE_TBB | ||
const auto dotResultPar = dot (std::execution::par, x, y); | ||
#else | ||
const auto dotResultPar = dot (std::execution::seq, x, y); | ||
#endif // LINALG_ENABLE_TBB | ||
static_assert( std::is_same_v<std::remove_const_t<decltype(dotResultPar)>, scalar_t> ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef LINALG_ENABLE_TBB | |
const auto dotResultPar = dot (std::execution::par, x, y); | |
#else | |
const auto dotResultPar = dot (std::execution::seq, x, y); | |
#endif // LINALG_ENABLE_TBB | |
static_assert( std::is_same_v<std::remove_const_t<decltype(dotResultPar)>, scalar_t> ); | |
#ifdef LINALG_ENABLE_TBB | |
auto dotResultPar = dot (std::execution::par, x, y); | |
#else | |
auto dotResultPar = dot (std::execution::seq, x, y); | |
#endif // LINALG_ENABLE_TBB | |
static_assert( std::is_same_v<decltype(dotResultPar), scalar_t> ); |
@@ -130,6 +130,15 @@ if(LINALG_ENABLE_KOKKOS) | |||
find_package(KokkosKernels REQUIRED) | |||
endif() | |||
|
|||
find_package(TBB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More recent versions of GCC (for instance) shouldn't require TBB for std::execution::par
to work. Furthermore, nvc++ comes with its own, non-TBB implementation of std::execution::par
. Would you consider gating this on compiler version, instead of requiring a third-party library? That could even be done in the code -- the code would just need to test the appropriate compiler version to ensure that the C++ algorithms are available (see https://en.cppreference.com/w/cpp/compiler_support/17 and search for "Parallel algorithms and execution policies").
plus<void> deduces argument and return types, letting the two input types differ. Co-authored-by: Mark Hoemmen <[email protected]>
@@ -31,11 +31,11 @@ int main(int argc, char* argv[]) { | |||
mdspan<double, extents<std::size_t, dynamic_extent>> y(y_vec.data(),N); | |||
for(int i=0; i<A.extent(0); i++) | |||
for(int j=0; j<A.extent(1); j++) | |||
A(i,j) = 100.0*i+j; | |||
A[i,j] = 100.0*i+j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only compile if the C++23 feature "multidimensional subscript operator" (P2128R6) is available. As a result, would you consider one of the following changes?
- Protect the example so it only compiles if the feature test macro
__cpp_multidimensional_subscript
is defined, OR - Add
#define MDSPAN_USE_PAREN_OPERATOR 1
to the example before including anymdspan
headers (see top of https://godbolt.org/z/Yrr8oe9sE for an example of the relevant macros), and use parentheses instead of brackets (e.g.,A(i,j)
)
@@ -132,7 +132,7 @@ void add_rank_2( | |||
using size_type = std::common_type_t<SizeType_x, SizeType_y, SizeType_z>; | |||
for (size_type j = 0; j < x.extent(1); ++j) { | |||
for (size_type i = 0; i < x.extent(0); ++i) { | |||
z(i,j) = x(i,j) + y(i,j); | |||
z[i,j] = x[i,j] + y[i,j]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual implementation is a C++17 back-port, so we unfortunately have to roll with whatever operator (parentheses or brackets) the user has available. If you really do want to change the implementation, then we might have to go with something like the following.
z[i,j] = x[i,j] + y[i,j]; | |
#if (MDSPAN_USE_PAREN_OPERATOR > 0) | |
z(i,j) = x(i,j) + y(i,j); | |
#else | |
z[i,j] = x[i,j] + y[i,j]; | |
#endif |
or at least
z[i,j] = x[i,j] + y[i,j]; | |
#if defined(__cpp_multidimensional_subscript) | |
z[i,j] = x[i,j] + y[i,j]; | |
#else | |
z(i,j) = x(i,j) + y(i,j); | |
#endif |
It might be best just to leave the implementation alone; we might want to come up with a better way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend, at least for now, not changing the implementation to use the multidimensional subscript operator. Doing this would hinder back-porting to C++17, which is an important requirement of the reference implementation.
@mhoemmen @crtrott Please review at your convenience and I'll do the same for the other functions.