[libcxx-commits] [PATCH] D149686: [libc++][DISCUSSION] Exploring PSTL backend customization points

Tue May 2 14:01:43 PDT 2023

crtrott added a comment.

I generally feel that we are better served with a overload set based on internal execution policy types. One problem I see is that we very, very fast are gonna get asked how to for example choose either SIMD or OpenMP based parallelism on a per invocation basis. The overload set based approach makes it easy to support this. Note we ARE allowed to ship implementation defined execution policies: "The semantics of parallel algorithms invoked with an execution policy object of implementation-defined type are implementation-defined." This is not undefined behavior, we just need to say what it does. That means for example we could ship std::omp_par std::omp_par_simd, std::gcd etc.. Or rather the LLVM-OpenMP project could ship std::omp_par together with a customization implementation of the algorithms.  And AMD in their ROCm toolchain could add std::par_hip or something like that.

Then the only configuration decision is essentially what to map the mandated execution policies too.

Here is some code from our std::linalg prototype which does something like this:

  c++
  template<class .........>
  void matrix_vector_product(
    ExecutionPolicy&& exec, mdspan<...> A, mdspan<...> x, mdspan<...> y)
  {
    constexpr bool use_custom = is_custom_mat_vec_product_avail<
      decltype(execpolicy_mapper(exec)), decltype(A), decltype(x), decltype(y)>::value;

    if constexpr(use_custom) {
      matrix_vector_product(execpolicy_mapper(exec), A, x, y);
    } else {
      matrix_vector_product(std::experimental::linalg::impl::inline_exec_t(), A, x, y);
    }
  }

Basically there are no real implementations which take any of the standard `std::execution` policies. We implement overloads with internal execution policies. The `execpolicy_mapper(exec)` will by default return exec, except for the official `std::execution`. policies which map to an internal one (right now for us its all mapping to
`std::experimental::linalg::impl::inline_exec_t` ). That actually has the advantage that you know what the internal impl does.

The `is_custom_mat_vec_product_avail` will check whether an overload is visible for the provided args. In linalg that allows vendor plugins to for example only provide implementations for the scalar types they got (like the fortran BLAS). That latter point probably doesn't matter to PSTL.

So a vendor shipping LLVM would be able to modify the mapper to let say `std::execution::par_unseq` map to `hip::gpu_exec` or whatever. They then can still only provide the overloads their customers asked for, while the other stuff will fallback to the default impl.

A non-llvm-shipper like Kokkos can also provide their own overloads, but they would only be called if you call std::linalg::matrix_vector_product(Kokkos::exec_policy, ...);

The last little piece not up there is that one would probably want to not just call `inline_exec_t()` in the else branch (if more than one internal implementation exists) but maybe check what public thing the handed exec policy is convertible too and then call the corresponding internal thing of that. I.e. you get `Kokkos::exec_policy` but no overload Kokkos::sort exists, however Kokkos::exec_policy is convertible to std::execution::par_unseq, so you could call `impl::par_unseq` overload or so.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149686/new/

https://reviews.llvm.org/D149686