Added language standard parallelism to dot product #251

amklinv-nnl · 2023-05-31T21:48:38Z

@mhoemmen @crtrott Please review at your convenience and I'll do the same for the other functions.

mhoemmen

Thanks for this contribution! : - ) Please see comments; thanks!

mhoemmen · 2023-05-31T22:07:43Z

examples/01_scale.cpp

@@ -13,7 +13,7 @@
 // Make mdspan less verbose
 using std::experimental::mdspan;
 using std::experimental::extents;
-using std::experimental::dynamic_extent;
+using std::dynamic_extent;


This change suggests that CI doesn't actually build the examples. Should we consider fixing that?

This change might actually need to depend on the C++ version, as std::dynamic_extent entered the Standard in C++20 with span. We'll have to revisit the examples anyway, because of the recent change to use macros to specify the namespaces. (The macros let users control them, so that they can but don't need to go into std.)

CI builds the examples; I just wasn't building them on my PC.

I was unaware of the macros change. Can you give me an example?

MDSPAN_IMPL_STANDARD_NAMESPACE and MDSPAN_IMPL_PROPOSED_NAMESPACE are the two macros. They can be defined by users, but they also get default definitions, e.g., here in include/experimental/mdspan. The library assumes that MDSPAN_IMPL_PROPOSED_NAMESPACE is nested inside MDSPAN_IMPL_STANDARD_NAMESPACE.

Using these macros might require updating the version of the reference mdspan implementation that github's CI pulls in.

mhoemmen · 2023-05-31T22:08:54Z

examples/03_matrix_vector_product_mixedprec.cpp

 using std::experimental::submdspan;
-using std::experimental::full_extent;
+using std::full_extent;


Please see above comment on std::full_extent; thanks!

mhoemmen · 2023-05-31T22:12:56Z

include/experimental/__p1673_bits/blas1_dot.hpp

@@ -43,6 +43,7 @@
 #ifndef LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_
 #define LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_

+#include <ranges>


<ranges> is a C++20 include. Would you consider protecting both the include and the use of iota_view below with the appropriate feature test macro, and providing a fall-back implementation? It's OK if the fall-back is not parallel.

Sure! Can you point me to an example of using a feature test macro?

Sure! This web page gives a good summary.

// Ensure that the feature test macro __cpp_lib_ranges is available; // <version> also defines this macro, but that is a C++20 header. #include <algorithm> #if defined(__cpp_lib_ranges) # include <ranges> #endif void some_function() { #if defined(__cpp_lib_ranges_iota) // ... code using views::iota ... #else // ... fall-back code ... #endif }

The point of using two different macros -- one for the header, and one for the specific feature iota::view -- is that the feature came after the header, so some compiler versions may have the header but not the feature.

include/experimental/__p1673_bits/blas1_dot.hpp

mhoemmen · 2023-05-31T22:21:24Z

include/experimental/__p1673_bits/blas1_dot.hpp

-    init += v1(k) * v2(k);
-  }
-  return init;
+  using scalar_type = std::common_type_t<ElementType1, ElementType2, Scalar>;


This is not necessary the right type. For example, if operator* returns a higher-precision type than its inputs (as might make sense for custom integer or fixed-point real types, for example), then what you want here is the common type of Scalar and the result of operator*. However, it turns out that you don't need scalar_type here; please see the comment below.

mhoemmen · 2023-05-31T22:36:34Z

include/experimental/__p1673_bits/linalg_execpolicy_mapper.hpp

@@ -12,11 +12,12 @@ inline namespace __p1673_version_0 {
 namespace linalg {
 namespace impl {
 // the execution policy used for default serial inline implementations
-struct inline_exec_t {};
+using inline_exec_t = std::execution::sequenced_policy;


Please see note above about the "inline" executor.

Any Standard Algorithm overload that takes an ExecutionPolicy template parameter is a "parallel algorithm" (see [algorithms.parallel] 2). That changes what the algorithm assumes about the behavior of element access functions.

Therefore, it's really important that the "inline" executor not use any parallel algorithm, even if ExecutionPolicy is sequenced_policy.

mhoemmen · 2023-05-31T22:37:36Z

include/experimental/__p1673_bits/linalg_execpolicy_mapper.hpp


 // The execution policy used when no execution policy is provided
 // It must be remapped to some other execution policy, which the default mapper does
-struct default_exec_t {};
+using default_exec_t = std::execution::parallel_policy;


// It must be remapped to some other execution policy, which the default mapper does -- it's important that we not circumvent the default mapper. Would you consider, thus, reverting this change?

If I remember correctly, the execpolicy_mapper did not like this default. I can restore it and show you the error it gave me.

mhoemmen · 2023-05-31T22:38:49Z

tests/kokkos-based/matrix_frob_norm_kokkos.cpp

@@ -87,7 +87,7 @@ TEST_F(blas2_signed_float_fixture, kokkos_matrix_frob_norm_trivial_empty)
 {
  std::vector<value_type> v;

-  constexpr auto de = std::experimental::dynamic_extent;
+  constexpr auto de = std::dynamic_extent;


Does CI not actually test this test? (Please see comment above about std::dynamic_extent.) If so, then should we consider using the namespace macros instead of std explicitly?

mhoemmen · 2023-05-31T22:40:27Z

tests/native/dot.cpp

+#ifdef LINALG_ENABLE_TBB
+    const auto dotResultPar = dot (std::execution::par, x, y);
+#else
+    const auto dotResultPar = dot (std::execution::seq, x, y);
+#endif // LINALG_ENABLE_TBB
+    static_assert( std::is_same_v<std::remove_const_t<decltype(dotResultPar)>, scalar_t> );


Suggested change

#ifdef LINALG_ENABLE_TBB

const auto dotResultPar = dot (std::execution::par, x, y);

#else

const auto dotResultPar = dot (std::execution::seq, x, y);

#endif // LINALG_ENABLE_TBB

static_assert( std::is_same_v<std::remove_const_t<decltype(dotResultPar)>, scalar_t> );

#ifdef LINALG_ENABLE_TBB

auto dotResultPar = dot (std::execution::par, x, y);

#else

auto dotResultPar = dot (std::execution::seq, x, y);

#endif // LINALG_ENABLE_TBB

static_assert( std::is_same_v<decltype(dotResultPar), scalar_t> );

mhoemmen · 2023-05-31T22:48:16Z

CMakeLists.txt

@@ -130,6 +130,15 @@ if(LINALG_ENABLE_KOKKOS)
  find_package(KokkosKernels REQUIRED)
 endif()

+find_package(TBB)


More recent versions of GCC (for instance) shouldn't require TBB for std::execution::par to work. Furthermore, nvc++ comes with its own, non-TBB implementation of std::execution::par. Would you consider gating this on compiler version, instead of requiring a third-party library? That could even be done in the code -- the code would just need to test the appropriate compiler version to ensure that the C++ algorithms are available (see https://en.cppreference.com/w/cpp/compiler_support/17 and search for "Parallel algorithms and execution policies").

plus<void> deduces argument and return types, letting the two input types differ. Co-authored-by: Mark Hoemmen <[email protected]>

mhoemmen · 2023-06-07T20:26:37Z

examples/02_matrix_vector_product_basic.cpp

@@ -31,11 +31,11 @@ int main(int argc, char* argv[]) {
    mdspan<double, extents<std::size_t, dynamic_extent>> y(y_vec.data(),N);
    for(int i=0; i<A.extent(0); i++)
      for(int j=0; j<A.extent(1); j++)
-        A(i,j) = 100.0*i+j;
+        A[i,j] = 100.0*i+j;


This will only compile if the C++23 feature "multidimensional subscript operator" (P2128R6) is available. As a result, would you consider one of the following changes?

Protect the example so it only compiles if the feature test macro __cpp_multidimensional_subscript is defined, OR

Add #define MDSPAN_USE_PAREN_OPERATOR 1 to the example before including any mdspan headers (see top of https://godbolt.org/z/Yrr8oe9sE for an example of the relevant macros), and use parentheses instead of brackets (e.g., A(i,j))

mhoemmen · 2023-06-07T20:34:20Z

include/experimental/__p1673_bits/blas1_linalg_add.hpp

@@ -132,7 +132,7 @@ void add_rank_2(
  using size_type = std::common_type_t<SizeType_x, SizeType_y, SizeType_z>;
  for (size_type j = 0; j < x.extent(1); ++j) {
    for (size_type i = 0; i < x.extent(0); ++i) {
-      z(i,j) = x(i,j) + y(i,j);
+      z[i,j] = x[i,j] + y[i,j];


The actual implementation is a C++17 back-port, so we unfortunately have to roll with whatever operator (parentheses or brackets) the user has available. If you really do want to change the implementation, then we might have to go with something like the following.

Suggested change

z[i,j] = x[i,j] + y[i,j];

#if (MDSPAN_USE_PAREN_OPERATOR > 0)

z(i,j) = x(i,j) + y(i,j);

#else

z[i,j] = x[i,j] + y[i,j];

#endif

or at least

Suggested change

z[i,j] = x[i,j] + y[i,j];

#if defined(__cpp_multidimensional_subscript)

z[i,j] = x[i,j] + y[i,j];

#else

z(i,j) = x(i,j) + y(i,j);

#endif

It might be best just to leave the implementation alone; we might want to come up with a better way to do this.

mhoemmen

I would recommend, at least for now, not changing the implementation to use the multidimensional subscript operator. Doing this would hinder back-porting to C++17, which is an important requirement of the reference implementation.

amklinv-nnl added 3 commits May 16, 2023 06:50

Bring code up to date with 23 standard

d2ae43f

Builds with [email protected]

Added stdpar functionality to dot product

dfaf869

Example uses std::full_extent without experimental

6574231

mhoemmen requested changes May 31, 2023

View reviewed changes

amklinv-nnl and others added 3 commits May 31, 2023 19:06

Replace plus<scalar> with plus<void>

b6cb626

plus<void> deduces argument and return types, letting the two input types differ. Co-authored-by: Mark Hoemmen <[email protected]>

Replaced (i,j) subscripting with [i,j] subscripts

c3f4957

Fix continuous integration

2b8f143

mhoemmen reviewed Jun 7, 2023

View reviewed changes

mhoemmen requested changes Jun 7, 2023

View reviewed changes

amklinv-nnl marked this pull request as draft June 12, 2023 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added language standard parallelism to dot product #251

Added language standard parallelism to dot product #251

amklinv-nnl commented May 31, 2023

mhoemmen left a comment

mhoemmen May 31, 2023

amklinv-nnl May 31, 2023

mhoemmen Jun 1, 2023

mhoemmen May 31, 2023

mhoemmen May 31, 2023

amklinv-nnl May 31, 2023

mhoemmen Jun 1, 2023

mhoemmen May 31, 2023

mhoemmen May 31, 2023

mhoemmen May 31, 2023

amklinv-nnl May 31, 2023

mhoemmen May 31, 2023

mhoemmen May 31, 2023

mhoemmen May 31, 2023

mhoemmen Jun 7, 2023

mhoemmen Jun 7, 2023 •

edited

Loading

mhoemmen left a comment

Added language standard parallelism to dot product #251

Are you sure you want to change the base?

Added language standard parallelism to dot product #251

Conversation

amklinv-nnl commented May 31, 2023

mhoemmen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhoemmen Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

mhoemmen left a comment

Choose a reason for hiding this comment

mhoemmen Jun 7, 2023 •

edited

Loading