Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added language standard parallelism to dot product #251

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .github/workflows/cmake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
fail-fast: false
matrix:
include:
- compiler_driver: g++
- compiler_driver: g++-12
compiler_prefix: /usr/bin
steps:
- name: Create Build Environment
Expand All @@ -27,7 +27,7 @@ jobs:

- name: Configure CMake
working-directory: mdspan-build
run: cmake $GITHUB_WORKSPACE/mdspan-src -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$GITHUB_WORKSPACE/mdspan-install
run: cmake $GITHUB_WORKSPACE/mdspan-src -DCMAKE_CXX_COMPILER=g++-12 -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$GITHUB_WORKSPACE/mdspan-install

- name: Build
working-directory: mdspan-build
Expand Down Expand Up @@ -65,7 +65,7 @@ jobs:
- name: Configure CMake
shell: bash
working-directory: stdblas-build
run: cmake $GITHUB_WORKSPACE/stdblas-src -Dmdspan_DIR=$GITHUB_WORKSPACE/mdspan-install/lib/cmake/mdspan -DLINALG_ENABLE_TESTS=On -DLINALG_ENABLE_EXAMPLES=On -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$GITHUB_WORKSPACE/stdblas-install
run: cmake $GITHUB_WORKSPACE/stdblas-src -DCMAKE_CXX_COMPILER=g++-12 -Dmdspan_DIR=$GITHUB_WORKSPACE/mdspan-install/lib/cmake/mdspan -DLINALG_ENABLE_TESTS=On -DLINALG_ENABLE_EXAMPLES=On -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$GITHUB_WORKSPACE/stdblas-install

- name: Upload workspace
uses: actions/upload-artifact@v2
Expand Down Expand Up @@ -102,8 +102,6 @@ jobs:

test-stdBLAS:
runs-on: ubuntu-latest
container:
image: amklinv/mdspan-dependencies:latest
needs: build-stdblas

steps:
Expand Down
13 changes: 13 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,15 @@ if(LINALG_ENABLE_KOKKOS)
find_package(KokkosKernels REQUIRED)
endif()

find_package(TBB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More recent versions of GCC (for instance) shouldn't require TBB for std::execution::par to work. Furthermore, nvc++ comes with its own, non-TBB implementation of std::execution::par. Would you consider gating this on compiler version, instead of requiring a third-party library? That could even be done in the code -- the code would just need to test the appropriate compiler version to ensure that the C++ algorithms are available (see https://en.cppreference.com/w/cpp/compiler_support/17 and search for "Parallel algorithms and execution policies").

option(LINALG_ENABLE_TBB
"Enable Threaded Building Blocks for tests. Default: autodetect TBB installation."
${TBB_FOUND}
)
if(LINALG_ENABLE_TBB)
find_package(TBB REQUIRED)
endif()

################################################################################

CONFIGURE_FILE(include/experimental/__p1673_bits/linalg_config.h.in
Expand All @@ -152,6 +161,10 @@ if(LINALG_ENABLE_KOKKOS)
)
endif()

if(LINALG_ENABLE_TBB)
target_link_libraries(linalg INTERFACE TBB::tbb)
endif()

target_include_directories(linalg INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>
Expand Down
6 changes: 3 additions & 3 deletions examples/01_scale.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
// Make mdspan less verbose
using std::experimental::mdspan;
using std::experimental::extents;
using std::experimental::dynamic_extent;
using std::dynamic_extent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change suggests that CI doesn't actually build the examples. Should we consider fixing that?

This change might actually need to depend on the C++ version, as std::dynamic_extent entered the Standard in C++20 with span. We'll have to revisit the examples anyway, because of the recent change to use macros to specify the namespaces. (The macros let users control them, so that they can but don't need to go into std.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI builds the examples; I just wasn't building them on my PC.

I was unaware of the macros change. Can you give me an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDSPAN_IMPL_STANDARD_NAMESPACE and MDSPAN_IMPL_PROPOSED_NAMESPACE are the two macros. They can be defined by users, but they also get default definitions, e.g., here in include/experimental/mdspan. The library assumes that MDSPAN_IMPL_PROPOSED_NAMESPACE is nested inside MDSPAN_IMPL_STANDARD_NAMESPACE.

Using these macros might require updating the version of the reference mdspan implementation that github's CI pulls in.


int main(int argc, char* argv[]) {
std::cout << "Scale" << std::endl;
Expand All @@ -26,7 +26,7 @@ int main(int argc, char* argv[]) {
// With CTAD working we could do, GCC 11.1 works but some others are buggy
// mdspan x(x_vec.data(), N);
mdspan<double, extents<std::size_t, dynamic_extent>> x(x_vec.data(),N);
for(int i=0; i<x.extent(0); i++) x(i) = i;
for(int i=0; i<x.extent(0); i++) x[i] = i;

// Call linalg::scale x = 2.0*x;
std::experimental::linalg::scale(2.0, x);
Expand All @@ -36,6 +36,6 @@ int main(int argc, char* argv[]) {
std::experimental::linalg::scale(2.0, x);
#endif

for(int i=0; i<x.extent(0); i+=5) std::cout << i << " " << x(i) << std::endl;
for(int i=0; i<x.extent(0); i+=5) std::cout << i << " " << x[i] << std::endl;
}
}
10 changes: 5 additions & 5 deletions examples/02_matrix_vector_product_basic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
// Make mdspan less verbose
using std::experimental::mdspan;
using std::experimental::extents;
using std::experimental::dynamic_extent;
using std::dynamic_extent;

int main(int argc, char* argv[]) {
std::cout << "Matrix Vector Product Basic" << std::endl;
Expand All @@ -31,11 +31,11 @@ int main(int argc, char* argv[]) {
mdspan<double, extents<std::size_t, dynamic_extent>> y(y_vec.data(),N);
for(int i=0; i<A.extent(0); i++)
for(int j=0; j<A.extent(1); j++)
A(i,j) = 100.0*i+j;
A[i,j] = 100.0*i+j;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only compile if the C++23 feature "multidimensional subscript operator" (P2128R6) is available. As a result, would you consider one of the following changes?

  1. Protect the example so it only compiles if the feature test macro __cpp_multidimensional_subscript is defined, OR
  2. Add #define MDSPAN_USE_PAREN_OPERATOR 1 to the example before including any mdspan headers (see top of https://godbolt.org/z/Yrr8oe9sE for an example of the relevant macros), and use parentheses instead of brackets (e.g., A(i,j))

for(int i=0; i<x.extent(0); i++)
x(i) = 1. * i;
x[i] = 1. * i;
for(int i=0; i<y.extent(0); i++)
y(i) = -1. * i;
y[i] = -1. * i;

// y = A * x
std::experimental::linalg::matrix_vector_product(A, x, y);
Expand All @@ -50,6 +50,6 @@ int main(int argc, char* argv[]) {
std::experimental::linalg::scaled(2.0, A), x,
std::experimental::linalg::scaled(0.5, y), y);
#endif
for(int i=0; i<y.extent(0); i+=5) std::cout << i << " " << y(i) << std::endl;
for(int i=0; i<y.extent(0); i+=5) std::cout << i << " " << y[i] << std::endl;
}
}
12 changes: 6 additions & 6 deletions examples/03_matrix_vector_product_mixedprec.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
// Make mdspan less verbose
using std::experimental::mdspan;
using std::experimental::extents;
using std::experimental::dynamic_extent;
using std::dynamic_extent;
using std::experimental::submdspan;
using std::experimental::full_extent;
using std::full_extent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see above comment on std::full_extent; thanks!


int main(int argc, char* argv[]) {
std::cout << "Matrix Vector Product MixedPrec" << std::endl;
Expand All @@ -25,13 +25,13 @@ int main(int argc, char* argv[]) {
for(int m=0; m<A.extent(0); m++)
for(int i=0; i<A.extent(1); i++)
for(int j=0; j<A.extent(2); j++)
A(m,i,j) = 1000.0 * m + 100.0 * i + j;
A[m,i,j] = 1000.0 * m + 100.0 * i + j;
for(int i=0; i<x.extent(0); i++)
for(int m=0; m<x.extent(1); m++)
x(i,m) = 33. * i + 0.33 * m;
x[i,m] = 33. * i + 0.33 * m;
for(int m=0; m<y.extent(0); m++)
for(int i=0; i<y.extent(1); i++)
y(m,i) = 33. * m + 0.33 * i;
y[m,i] = 33. * m + 0.33 * i;

for(int m = 0; m < M; m++) {
auto A_m = submdspan(A, m, full_extent, full_extent);
Expand All @@ -41,7 +41,7 @@ int main(int argc, char* argv[]) {
std::experimental::linalg::matrix_vector_product(A_m, x_m, y_m);
}

for(int i=0; i<y.extent(0); i+=5) std::cout << i << " " << y(i,1) << std::endl;
for(int i=0; i<y.extent(0); i+=5) std::cout << i << " " << y[i,1] << std::endl;
}
}

2 changes: 1 addition & 1 deletion examples/kokkos-based/add_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ int main(int argc, char* argv[])
value_type* y_ptr = y_view.data();
value_type* z_ptr = z_view.data();

using dyn_1d_ext_type = std::experimental::extents<std::experimental::dynamic_extent>;
using dyn_1d_ext_type = std::experimental::extents<std::dynamic_extent>;
using mdspan_type = std::experimental::mdspan<value_type, dyn_1d_ext_type>;
mdspan_type x(x_ptr,N);
mdspan_type y(y_ptr,N);
Expand Down
2 changes: 1 addition & 1 deletion examples/kokkos-based/dot_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ int main(int argc, char* argv[])
value_type* a_ptr = a_view.data();
value_type* b_ptr = b_view.data();

using dyn_1d_ext_type = std::experimental::extents<std::experimental::dynamic_extent>;
using dyn_1d_ext_type = std::experimental::extents<std::dynamic_extent>;
using mdspan_type = std::experimental::mdspan<value_type, dyn_1d_ext_type>;
mdspan_type a(a_ptr,N);
mdspan_type b(b_ptr,N);
Expand Down
2 changes: 1 addition & 1 deletion examples/kokkos-based/dotc_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ int main(int argc, char* argv[])
value_type* a_ptr = a_view.data();
value_type* b_ptr = b_view.data();

using dyn_1d_ext_type = std::experimental::extents<std::experimental::dynamic_extent>;
using dyn_1d_ext_type = std::experimental::extents<std::dynamic_extent>;
using mdspan_type = std::experimental::mdspan<value_type, dyn_1d_ext_type>;
mdspan_type a(a_ptr,N);
mdspan_type b(b_ptr,N);
Expand Down
2 changes: 1 addition & 1 deletion examples/kokkos-based/scale_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ int main(int argc, char* argv[])

// Requires CTAD working, GCC 11.1 works but some others are buggy
// std::experimental::mdspan a(a_ptr,N);
std::experimental::mdspan<double,std::experimental::extents<std::experimental::dynamic_extent>> a(a_ptr,N);
std::experimental::mdspan<double,std::experimental::extents<std::dynamic_extent>> a(a_ptr,N);
for(std::size_t i=0; i<a.extent(0); i++) a(i) = i;

// This forwards to KokkosKernels (https://github.com/kokkos/kokkos-kernels
Expand Down
2 changes: 1 addition & 1 deletion examples/kokkos-based/vector_abs_sum_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ int main(int argc, char* argv[])
Kokkos::View<value_type*> x_view("x",N);
value_type* x_ptr = x_view.data();

using dyn_1d_ext_type = std::experimental::extents<std::experimental::dynamic_extent>;
using dyn_1d_ext_type = std::experimental::extents<std::dynamic_extent>;
using mdspan_type = std::experimental::mdspan<value_type, dyn_1d_ext_type>;
mdspan_type x(x_ptr,N);
for(std::size_t i=0; i<x.extent(0); i++){
Expand Down
2 changes: 1 addition & 1 deletion examples/kokkos-based/vector_norm2_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ int main(int argc, char* argv[])
Kokkos::View<value_type*> x_view("x",N);
value_type* x_ptr = x_view.data();

using dyn_1d_ext_type = std::experimental::extents<std::experimental::dynamic_extent>;
using dyn_1d_ext_type = std::experimental::extents<std::dynamic_extent>;
using mdspan_type = std::experimental::mdspan<value_type, dyn_1d_ext_type>;
mdspan_type x(x_ptr,N);
for(std::size_t i=0; i<x.extent(0); i++){
Expand Down
2 changes: 1 addition & 1 deletion examples/kokkos-based/vector_sum_of_squares_kokkos.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ int main(int argc, char* argv[])
Kokkos::View<value_type*> x_view("x",N);
value_type* x_ptr = x_view.data();

using dyn_1d_ext_type = std::experimental::extents<std::experimental::dynamic_extent>;
using dyn_1d_ext_type = std::experimental::extents<std::dynamic_extent>;
using mdspan_type = std::experimental::mdspan<value_type, dyn_1d_ext_type>;
mdspan_type x(x_ptr,N);
for(std::size_t i=0; i<x.extent(0); i++){
Expand Down
23 changes: 16 additions & 7 deletions include/experimental/__p1673_bits/blas1_dot.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
#ifndef LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_
#define LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_

#include <ranges>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<ranges> is a C++20 include. Would you consider protecting both the include and the use of iota_view below with the appropriate feature test macro, and providing a fall-back implementation? It's OK if the fall-back is not parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Can you point me to an example of using a feature test macro?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! This web page gives a good summary.

// Ensure that the feature test macro __cpp_lib_ranges is available;
// <version> also defines this macro, but that is a C++20 header.
#include <algorithm>

#if defined(__cpp_lib_ranges)
#  include <ranges>
#endif

void some_function() {
#if defined(__cpp_lib_ranges_iota)
  // ... code using views::iota ...
#else
  // ... fall-back code ...
#endif
}

The point of using two different macros -- one for the header, and one for the specific feature iota::view -- is that the feature came after the header, so some compiler versions may have the header but not the feature.

#include <type_traits>

namespace std {
Expand Down Expand Up @@ -90,7 +91,7 @@ template<class ElementType1,
class Accessor2,
class Scalar>
Scalar dot(
std::experimental::linalg::impl::inline_exec_t&& /* exec */,
std::experimental::linalg::impl::inline_exec_t&& exec,
std::experimental::mdspan<ElementType1, std::experimental::extents<SizeType1, ext1>, Layout1, Accessor1> v1,
std::experimental::mdspan<ElementType2, std::experimental::extents<SizeType2, ext2>, Layout2, Accessor2> v2,
Scalar init)
Expand All @@ -100,10 +101,18 @@ Scalar dot(
v1.static_extent(0) == v2.static_extent(0));

using size_type = std::common_type_t<SizeType1, SizeType2>;
for (size_type k = 0; k < v1.extent(0); ++k) {
init += v1(k) * v2(k);
}
return init;
using scalar_type = std::common_type_t<ElementType1, ElementType2, Scalar>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary the right type. For example, if operator* returns a higher-precision type than its inputs (as might make sense for custom integer or fixed-point real types, for example), then what you want here is the common type of Scalar and the result of operator*. However, it turns out that you don't need scalar_type here; please see the comment below.

using std::ranges::iota_view;
using std::ranges::begin;
using std::ranges::end;

iota_view range{size_type{}, v1.extent(0)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred way to use range factories is std::views::FOO (in this case, std::views::iota) instead of std::ranges::FOO_view.


Scalar sum = std::transform_reduce(exec, begin(range), end(range), init,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "inline" executor should be executing inline. This means that, while it can use Standard Library algorithms, it shouldn't be passing along any execution policy (even if that execution policy is "sequential" -- please see my comments below). Would you consider changing this to remove the execution policy argument?

Suggested change
Scalar sum = std::transform_reduce(exec, begin(range), end(range), init,
Scalar sum = std::transform_reduce(begin(range), end(range), init,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. What do we need to add to handle other execution policies?

std::plus<void>{},
[=](size_type i) { return v1[i] * v2[i]; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct. Another approach would be to use std::views::transform on the iota view to turn i into v1[i] (or v2[i]). Then you could use std::plus<void> directly, instead of creating a lambda that captures v1 and v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input. Can you show me what that would look like?


return sum;
}

template<class ExecutionPolicy,
Expand Down Expand Up @@ -155,7 +164,7 @@ Scalar dot(std::experimental::mdspan<ElementType1, std::experimental::extents<Si
std::experimental::mdspan<ElementType2, std::experimental::extents<SizeType2, ext2>, Layout2, Accessor2> v2,
Scalar init)
{
return dot(std::experimental::linalg::impl::default_exec_t(), v1, v2, init);
return dot(std::experimental::linalg::impl::default_exec(), v1, v2, init);
}

template<class ElementType1,
Expand Down Expand Up @@ -217,7 +226,7 @@ namespace dot_detail {
auto dot_return_type_deducer(
std::experimental::mdspan<ElementType1, std::experimental::extents<SizeType1, ext1>, Layout1, Accessor1> x,
std::experimental::mdspan<ElementType2, std::experimental::extents<SizeType2, ext2>, Layout2, Accessor2> y)
-> decltype(x(0) * y(0));
-> decltype(x[0] * y[0]);
} // namespace dot_detail


Expand Down
12 changes: 6 additions & 6 deletions include/experimental/__p1673_bits/blas1_givens.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -399,9 +399,9 @@ void givens_rotation_apply(
using index_type = ::std::common_type_t<SizeType1, SizeType2>;
const auto x_extent_0 = static_cast<index_type>(x.extent(0));
for (index_type i = 0; i < x_extent_0; ++i) {
const auto dtemp = c * x(i) + s * y(i);
y(i) = c * y(i) - s * x(i);
x(i) = dtemp;
const auto dtemp = c * x[i] + s * y[i];
y[i] = c * y[i] - s * x[i];
x[i] = dtemp;
}
}

Expand Down Expand Up @@ -496,9 +496,9 @@ void givens_rotation_apply(
using index_type = ::std::common_type_t<SizeType1, SizeType2>;
const auto x_extent_0 = static_cast<index_type>(x.extent(0));
for (index_type i = 0; i < x_extent_0; ++i) {
const auto dtemp = c * x(i) + s * y(i);
y(i) = c * y(i) - conj(s) * x(i);
x(i) = dtemp;
const auto dtemp = c * x[i] + s * y[i];
y[i] = c * y[i] - conj(s) * x[i];
x[i] = dtemp;
}
}

Expand Down
4 changes: 2 additions & 2 deletions include/experimental/__p1673_bits/blas1_linalg_add.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ void add_rank_1(

using size_type = std::common_type_t<SizeType_x, SizeType_y, SizeType_z>;
for (size_type i = 0; i < z.extent(0); ++i) {
z(i) = x(i) + y(i);
z[i] = x[i] + y[i];
}
}

Expand Down Expand Up @@ -132,7 +132,7 @@ void add_rank_2(
using size_type = std::common_type_t<SizeType_x, SizeType_y, SizeType_z>;
for (size_type j = 0; j < x.extent(1); ++j) {
for (size_type i = 0; i < x.extent(0); ++i) {
z(i,j) = x(i,j) + y(i,j);
z[i,j] = x[i,j] + y[i,j];
Copy link
Contributor

@mhoemmen mhoemmen Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual implementation is a C++17 back-port, so we unfortunately have to roll with whatever operator (parentheses or brackets) the user has available. If you really do want to change the implementation, then we might have to go with something like the following.

Suggested change
z[i,j] = x[i,j] + y[i,j];
#if (MDSPAN_USE_PAREN_OPERATOR > 0)
z(i,j) = x(i,j) + y(i,j);
#else
z[i,j] = x[i,j] + y[i,j];
#endif

or at least

Suggested change
z[i,j] = x[i,j] + y[i,j];
#if defined(__cpp_multidimensional_subscript)
z[i,j] = x[i,j] + y[i,j];
#else
z(i,j) = x(i,j) + y(i,j);
#endif

It might be best just to leave the implementation alone; we might want to come up with a better way to do this.

}
}
}
Expand Down
4 changes: 2 additions & 2 deletions include/experimental/__p1673_bits/blas1_linalg_copy.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ void copy_rank_1(
x.static_extent(0) == y.static_extent(0));
using size_type = std::common_type_t<SizeType_x, SizeType_y>;
for (size_type i = 0; i < y.extent(0); ++i) {
y(i) = x(i);
y[i] = x[i];
}
}

Expand Down Expand Up @@ -98,7 +98,7 @@ void copy_rank_2(
using size_type = std::common_type_t<SizeType_x, SizeType_y>;
for (size_type j = 0; j < y.extent(1); ++j) {
for (size_type i = 0; i < y.extent(0); ++i) {
y(i,j) = x(i,j);
y[i,j] = x[i,j];
}
}
}
Expand Down
4 changes: 2 additions & 2 deletions include/experimental/__p1673_bits/blas1_linalg_swap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ void swap_rank_1(
using size_type = std::common_type_t<SizeType_x, SizeType_y>;

for (size_type i = 0; i < y.extent(0); ++i) {
swap(x(i), y(i));
swap(x[i], y[i]);
}
}

Expand Down Expand Up @@ -106,7 +106,7 @@ void swap_rank_2(

for (size_type j = 0; j < y.extent(1); ++j) {
for (size_type i = 0; i < y.extent(0); ++i) {
swap(x(i,j), y(i,j));
swap(x[i,j], y[i,j]);
}
}
}
Expand Down
6 changes: 3 additions & 3 deletions include/experimental/__p1673_bits/blas1_matrix_inf_norm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,14 @@ Scalar matrix_inf_norm(
return result;
}
else if(A.extent(0) == size_type(1) && A.extent(1) == size_type(1)) {
result += abs(A(0, 0));
result += abs(A[0, 0]);
return result;
}

for (size_type i = 0; i < A.extent(0); ++i) {
auto row_sum = init;
for (size_type j = 0; j < A.extent(1); ++j) {
row_sum += abs(A(i,j));
row_sum += abs(A[i,j]);
}
result = max(row_sum, result);
}
Expand Down Expand Up @@ -170,7 +170,7 @@ namespace matrix_inf_norm_detail {
class Layout,
class Accessor>
auto matrix_inf_norm_return_type_deducer(
std::experimental::mdspan<ElementType, std::experimental::extents<SizeType, numRows, numCols>, Layout, Accessor> A) -> decltype(abs(A(0,0)));
std::experimental::mdspan<ElementType, std::experimental::extents<SizeType, numRows, numCols>, Layout, Accessor> A) -> decltype(abs(A[0,0]));

} // namespace matrix_inf_norm_detail

Expand Down
Loading