[libcxx-commits] [libcxx] [libc++] Overhaul the PSTL dispatching mechanism (PR #88131)

Louis Dionne via libcxx-commits libcxx-commits at lists.llvm.org
Mon Jun 3 11:19:34 PDT 2024


https://github.com/ldionne updated https://github.com/llvm/llvm-project/pull/88131

>From 1de13f45cd86e651357e78941a6b17b8dc1a4ee3 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 17 Apr 2024 11:34:14 -0400
Subject: [PATCH 1/8] [libc++] PSTL dispatching mechanism overhaul

The experimental PSTL's current dispatching mechanism was designed with
flexibility in mind. However, while reviewing the in-progress OpenMP
backend, I realized that the dispatching mechanism based on ADL and
default definitions in the frontend had several downsides. To name a
few:

1. The dispatching of an algorithm to the back-end and its default
  implementation is bundled together via `_LIBCPP_PSTL_CUSTOMIZATION_POINT`.
  This makes the dispatching really confusing and leads to annoyances
  such as variable shadowing and weird lambda captures in the front-end.
2. The distinction between back-end functions and front-end algorithms
  is not as clear as it could be, which led us to call one where we meant
  the other in a few cases. This is bad due to the exception requirements
  of the PSTL: calling a front-end algorithm inside the implementation of
  a back-end is incorrect for exception-safety.
3. There are two levels of back-end dispatching in the PSTL, which treat
  CPU backends as a special case. This was confusing and not as flexible
  as we'd like. For example, there was no straightforward way to dispatch
  all uses of `unseq` to a specific back-end from the OpenMP backend,
  or for CPU backends to fall back on each other.

This patch rewrites the backend dispatching mechanism to solve these
problems, but doesn't touch any of the actual implementation of
algorithms. Specifically, this rewrite has the following characteristics:

- All back-ends are full top-level backends defining all the basis operations
required by the PSTL. This is made realistic for CPU backends by providing
the CPU-based basis operations as simple helpers that can easily be reused
when defining the PSTL basis operations.

- The default definitions for algorithms are separated from their dispatching
logic and grouped in families instead, based on the basis operation they
require for their default implementation.

- The front-end is thus simplified a whole lot and made very consistent
for all algorithms, which makes it easier to audit the front-end for
things like exception-correctness, appropriate forwarding, etc.

Fixes #70718
---
 libcxx/include/CMakeLists.txt                 |    5 +-
 libcxx/include/__algorithm/pstl.h             | 1141 ++++-------------
 .../__algorithm/pstl_frontend_dispatch.h      |   44 -
 libcxx/include/__numeric/pstl.h               |  197 +--
 libcxx/include/__pstl/README.md               |  171 +++
 libcxx/include/__pstl/backend_fwd.h           |  128 ++
 libcxx/include/__pstl/backends/default.h      |  508 ++++++++
 libcxx/include/__pstl/backends/libdispatch.h  |   61 +-
 libcxx/include/__pstl/backends/serial.h       |  182 ++-
 libcxx/include/__pstl/backends/std_thread.h   |   72 +-
 libcxx/include/__pstl/configuration.h         |    8 +
 libcxx/include/__pstl/configuration_fwd.h     |  233 +---
 libcxx/include/__pstl/cpu_algos/any_of.h      |   44 +-
 libcxx/include/__pstl/cpu_algos/fill.h        |   45 +-
 libcxx/include/__pstl/cpu_algos/find_if.h     |   58 +-
 libcxx/include/__pstl/cpu_algos/for_each.h    |   45 +-
 libcxx/include/__pstl/cpu_algos/merge.h       |   95 +-
 libcxx/include/__pstl/cpu_algos/stable_sort.h |   29 +-
 libcxx/include/__pstl/cpu_algos/transform.h   |  178 +--
 .../__pstl/cpu_algos/transform_reduce.h       |  194 +--
 libcxx/include/__pstl/dispatch.h              |   66 +
 libcxx/include/__pstl/run_backend.h           |   57 +
 libcxx/include/module.modulemap               |   24 +-
 .../pstl.iterator-requirements.verify.cpp     |    1 +
 ..._customization_points_not_working.pass.cpp |  405 ------
 .../test/libcxx/transitive_includes/cxx23.csv |    2 +-
 .../test/libcxx/transitive_includes/cxx26.csv |    2 +-
 27 files changed, 1874 insertions(+), 2121 deletions(-)
 delete mode 100644 libcxx/include/__algorithm/pstl_frontend_dispatch.h
 create mode 100644 libcxx/include/__pstl/README.md
 create mode 100644 libcxx/include/__pstl/backend_fwd.h
 create mode 100644 libcxx/include/__pstl/backends/default.h
 create mode 100644 libcxx/include/__pstl/dispatch.h
 create mode 100644 libcxx/include/__pstl/run_backend.h
 delete mode 100644 libcxx/test/libcxx/algorithms/pstl.robust_against_customization_points_not_working.pass.cpp

diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 33ee5b26bd621..d49593674fd80 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -73,7 +73,6 @@ set(files
   __algorithm/pop_heap.h
   __algorithm/prev_permutation.h
   __algorithm/pstl.h
-  __algorithm/pstl_frontend_dispatch.h
   __algorithm/push_heap.h
   __algorithm/ranges_adjacent_find.h
   __algorithm/ranges_all_of.h
@@ -570,6 +569,8 @@ set(files
   __numeric/transform_reduce.h
   __ostream/basic_ostream.h
   __ostream/print.h
+  __pstl/backend_fwd.h
+  __pstl/backends/default.h
   __pstl/backends/libdispatch.h
   __pstl/backends/serial.h
   __pstl/backends/std_thread.h
@@ -584,6 +585,8 @@ set(files
   __pstl/cpu_algos/stable_sort.h
   __pstl/cpu_algos/transform.h
   __pstl/cpu_algos/transform_reduce.h
+  __pstl/dispatch.h
+  __pstl/run_backend.h
   __random/bernoulli_distribution.h
   __random/binomial_distribution.h
   __random/cauchy_distribution.h
diff --git a/libcxx/include/__algorithm/pstl.h b/libcxx/include/__algorithm/pstl.h
index 68b4e3e77ec6c..784ea18bd9a06 100644
--- a/libcxx/include/__algorithm/pstl.h
+++ b/libcxx/include/__algorithm/pstl.h
@@ -9,30 +9,19 @@
 #ifndef _LIBCPP___ALGORITHM_PSTL_H
 #define _LIBCPP___ALGORITHM_PSTL_H
 
-#include <__algorithm/copy_n.h>
-#include <__algorithm/count.h>
-#include <__algorithm/equal.h>
-#include <__algorithm/fill_n.h>
-#include <__algorithm/for_each.h>
-#include <__algorithm/for_each_n.h>
-#include <__algorithm/pstl_frontend_dispatch.h>
-#include <__atomic/atomic.h>
 #include <__config>
-#include <__functional/identity.h>
 #include <__functional/operations.h>
-#include <__iterator/concepts.h>
 #include <__iterator/cpp17_iterator_concepts.h>
 #include <__iterator/iterator_traits.h>
-#include <__numeric/pstl.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/configuration.h>
+#include <__pstl/dispatch.h>
+#include <__pstl/run_backend.h>
 #include <__type_traits/enable_if.h>
-#include <__type_traits/is_constant_evaluated.h>
 #include <__type_traits/is_execution_policy.h>
-#include <__type_traits/is_trivially_copyable.h>
 #include <__type_traits/remove_cvref.h>
-#include <__utility/empty.h>
+#include <__utility/forward.h>
 #include <__utility/move.h>
-#include <optional>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
@@ -45,54 +34,6 @@ _LIBCPP_PUSH_MACROS
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Predicate,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__remove_cvref_t<_ForwardIterator>>
-__find_if(_ExecutionPolicy&&, _ForwardIterator&& __first, _ForwardIterator&& __last, _Predicate&& __pred) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_find_if<_RawPolicy>(_Backend{}, std::move(__first), std::move(__last), std::move(__pred));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Predicate,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI _ForwardIterator
-find_if(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "find_if requires ForwardIterators");
-  auto __res = std::__find_if(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_any_of(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Predicate,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool> __any_of(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Predicate&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_any_of, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, _Predicate __g_pred) -> optional<bool> {
-        auto __res = std::__find_if(__policy, __g_first, __g_last, __g_pred);
-        if (!__res)
-          return nullopt;
-        return *__res != __g_last;
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__pred));
-}
-
 template <class _ExecutionPolicy,
           class _ForwardIterator,
           class _Predicate,
@@ -101,35 +42,9 @@ template <class _ExecutionPolicy,
 [[nodiscard]] _LIBCPP_HIDE_FROM_ABI bool
 any_of(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "any_of requires a ForwardIterator");
-  auto __res = std::__any_of(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_all_of(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Pred,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
-__all_of(_ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Pred&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_all_of, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, _Pred __g_pred) -> optional<bool> {
-        auto __res = std::__any_of(__policy, __g_first, __g_last, [&](__iter_reference<_ForwardIterator> __value) {
-          return !__g_pred(__value);
-        });
-        if (!__res)
-          return nullopt;
-        return !*__res;
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__pred));
+  using _Implementation = __pstl::__dispatch<__pstl::__any_of, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -140,33 +55,9 @@ template <class _ExecutionPolicy,
 [[nodiscard]] _LIBCPP_HIDE_FROM_ABI bool
 all_of(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "all_of requires a ForwardIterator");
-  auto __res = std::__all_of(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_none_of(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Pred,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
-__none_of(_ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Pred&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_none_of, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, _Pred __g_pred) -> optional<bool> {
-        auto __res = std::__any_of(__policy, __g_first, __g_last, __g_pred);
-        if (!__res)
-          return nullopt;
-        return !*__res;
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__pred));
+  using _Implementation = __pstl::__dispatch<__pstl::__all_of, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -177,142 +68,9 @@ template <class _ExecutionPolicy,
 [[nodiscard]] _LIBCPP_HIDE_FROM_ABI bool
 none_of(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "none_of requires a ForwardIterator");
-  auto __res = std::__none_of(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _UnaryOperation,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__remove_cvref_t<_ForwardOutIterator>>
-__transform(_ExecutionPolicy&&,
-            _ForwardIterator&& __first,
-            _ForwardIterator&& __last,
-            _ForwardOutIterator&& __result,
-            _UnaryOperation&& __op) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_transform<_RawPolicy>(
-      _Backend{}, std::move(__first), std::move(__last), std::move(__result), std::move(__op));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _UnaryOperation,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator transform(
-    _ExecutionPolicy&& __policy,
-    _ForwardIterator __first,
-    _ForwardIterator __last,
-    _ForwardOutIterator __result,
-    _UnaryOperation __op) {
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "transform requires ForwardIterators");
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator, "transform requires an OutputIterator");
-  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
-      _ForwardOutIterator, decltype(__op(*__first)), "transform requires an OutputIterator");
-  auto __res = std::__transform(__policy, std::move(__first), std::move(__last), std::move(__result), std::move(__op));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _ForwardOutIterator,
-          class _BinaryOperation,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI optional<__remove_cvref_t<_ForwardOutIterator>>
-__transform(_ExecutionPolicy&&,
-            _ForwardIterator1&& __first1,
-            _ForwardIterator1&& __last1,
-            _ForwardIterator2&& __first2,
-            _ForwardOutIterator&& __result,
-            _BinaryOperation&& __op) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_transform<_RawPolicy>(
-      _Backend{}, std::move(__first1), std::move(__last1), std::move(__first2), std::move(__result), std::move(__op));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _ForwardOutIterator,
-          class _BinaryOperation,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator transform(
-    _ExecutionPolicy&& __policy,
-    _ForwardIterator1 __first1,
-    _ForwardIterator1 __last1,
-    _ForwardIterator2 __first2,
-    _ForwardOutIterator __result,
-    _BinaryOperation __op) {
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "transform requires ForwardIterators");
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "transform requires ForwardIterators");
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator, "transform requires an OutputIterator");
-  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
-      _ForwardOutIterator, decltype(__op(*__first1, *__first2)), "transform requires an OutputIterator");
-  auto __res = std::__transform(
-      __policy, std::move(__first1), std::move(__last1), std::move(__first2), std::move(__result), std::move(__op));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Function,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__for_each(_ExecutionPolicy&&, _ForwardIterator&& __first, _ForwardIterator&& __last, _Function&& __func) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_for_each<_RawPolicy>(_Backend{}, std::move(__first), std::move(__last), std::move(__func));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Function,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI void
-for_each(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Function __func) {
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "for_each requires ForwardIterators");
-  if (!std::__for_each(__policy, std::move(__first), std::move(__last), std::move(__func)))
-    std::__throw_bad_alloc();
-}
-
-// TODO: Use the std::copy/move shenanigans to forward to std::memmove
-
-template <class>
-void __pstl_copy();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
-__copy(_ExecutionPolicy&& __policy,
-       _ForwardIterator&& __first,
-       _ForwardIterator&& __last,
-       _ForwardOutIterator&& __result) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_copy, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _ForwardIterator __g_last, _ForwardOutIterator __g_result) {
-        return std::__transform(__policy, __g_first, __g_last, __g_result, __identity());
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__result));
+  using _Implementation = __pstl::__dispatch<__pstl::__none_of, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -328,37 +86,9 @@ copy(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __l
       _ForwardOutIterator, "copy(first, last, result) requires result to be a ForwardIterator");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
       _ForwardOutIterator, decltype(*__first), "copy(first, last, result) requires result to be an OutputIterator");
-  auto __res = std::__copy(__policy, std::move(__first), std::move(__last), std::move(__result));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_copy_n();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _Size,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> __copy_n(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _Size&& __n, _ForwardOutIterator&& __result) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_copy_n, _RawPolicy),
-      [&__policy](
-          _ForwardIterator __g_first, _Size __g_n, _ForwardOutIterator __g_result) -> optional<_ForwardIterator> {
-        if constexpr (__has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-          return std::__copy(__policy, std::move(__g_first), std::move(__g_first + __g_n), std::move(__g_result));
-        } else {
-          (void)__policy;
-          return std::copy_n(__g_first, __g_n, __g_result);
-        }
-      },
-      std::move(__first),
-      std::move(__n),
-      std::move(__result));
+  using _Implementation = __pstl::__dispatch<__pstl::__copy, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__result));
 }
 
 template <class _ExecutionPolicy,
@@ -375,37 +105,9 @@ copy_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __n, _Forwar
       _ForwardOutIterator, "copy_n(first, n, result) requires result to be a ForwardIterator");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
       _ForwardOutIterator, decltype(*__first), "copy_n(first, n, result) requires result to be an OutputIterator");
-  auto __res = std::__copy_n(__policy, std::move(__first), std::move(__n), std::move(__result));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_count_if(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Predicate,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__iter_diff_t<_ForwardIterator>> __count_if(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Predicate&& __pred) noexcept {
-  using __diff_t = __iter_diff_t<_ForwardIterator>;
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_count_if, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, _Predicate __g_pred) -> optional<__diff_t> {
-        return std::__transform_reduce(
-            __policy,
-            std::move(__g_first),
-            std::move(__g_last),
-            __diff_t(),
-            std::plus{},
-            [&](__iter_reference<_ForwardIterator> __element) -> bool { return __g_pred(__element); });
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__pred));
+  using _Implementation = __pstl::__dispatch<__pstl::__copy_n, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__n), std::move(__result));
 }
 
 template <class _ExecutionPolicy,
@@ -417,33 +119,9 @@ _LIBCPP_HIDE_FROM_ABI __iter_diff_t<_ForwardIterator>
 count_if(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(
       _ForwardIterator, "count_if(first, last, pred) requires [first, last) to be ForwardIterators");
-  auto __res = std::__count_if(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_count(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__iter_diff_t<_ForwardIterator>> __count(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, const _Tp& __value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_count, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, const _Tp& __g_value)
-          -> optional<__iter_diff_t<_ForwardIterator>> {
-        return std::count_if(__policy, __g_first, __g_last, [&](__iter_reference<_ForwardIterator> __v) {
-          return __v == __g_value;
-        });
-      },
-      std::forward<_ForwardIterator>(__first),
-      std::forward<_ForwardIterator>(__last),
-      __value);
+  using _Implementation = __pstl::__dispatch<__pstl::__count_if, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -455,44 +133,9 @@ _LIBCPP_HIDE_FROM_ABI __iter_diff_t<_ForwardIterator>
 count(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(
       _ForwardIterator, "count(first, last, val) requires [first, last) to be ForwardIterators");
-  auto __res = std::__count(__policy, std::move(__first), std::move(__last), __value);
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
-}
-
-template <class>
-void __pstl_equal();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _Pred,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
-__equal(_ExecutionPolicy&& __policy,
-        _ForwardIterator1&& __first1,
-        _ForwardIterator1&& __last1,
-        _ForwardIterator2&& __first2,
-        _Pred&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_equal, _RawPolicy),
-      [&__policy](
-          _ForwardIterator1 __g_first1, _ForwardIterator1 __g_last1, _ForwardIterator2 __g_first2, _Pred __g_pred) {
-        return std::__transform_reduce(
-            __policy,
-            std::move(__g_first1),
-            std::move(__g_last1),
-            std::move(__g_first2),
-            true,
-            std::logical_and{},
-            std::move(__g_pred));
-      },
-      std::move(__first1),
-      std::move(__last1),
-      std::move(__first2),
-      std::move(__pred));
+  using _Implementation = __pstl::__dispatch<__pstl::__count, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __value);
 }
 
 template <class _ExecutionPolicy,
@@ -509,67 +152,31 @@ equal(_ExecutionPolicy&& __policy,
       _Pred __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "equal requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "equal requires ForwardIterators");
-  auto __res = std::__equal(__policy, std::move(__first1), std::move(__last1), std::move(__first2), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
+  using _Implementation = __pstl::__dispatch<__pstl::__equal_3leg, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
           class _ForwardIterator1,
           class _ForwardIterator2,
-          enable_if_t<is_execution_policy_v<__remove_cvref_t<_ExecutionPolicy>>, int> = 0>
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
 _LIBCPP_HIDE_FROM_ABI bool
 equal(_ExecutionPolicy&& __policy, _ForwardIterator1 __first1, _ForwardIterator1 __last1, _ForwardIterator2 __first2) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "equal requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "equal requires ForwardIterators");
-  auto __res = std::__equal(__policy, std::move(__first1), std::move(__last1), std::move(__first2), std::equal_to{});
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _Pred,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
-__equal(_ExecutionPolicy&& __policy,
-        _ForwardIterator1&& __first1,
-        _ForwardIterator1&& __last1,
-        _ForwardIterator2&& __first2,
-        _ForwardIterator2&& __last2,
-        _Pred&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_equal, _RawPolicy),
-      [&__policy](_ForwardIterator1 __g_first1,
-                  _ForwardIterator1 __g_last1,
-                  _ForwardIterator2 __g_first2,
-                  _ForwardIterator2 __g_last2,
-                  _Pred __g_pred) -> optional<bool> {
-        if constexpr (__has_random_access_iterator_category<_ForwardIterator1>::value &&
-                      __has_random_access_iterator_category<_ForwardIterator2>::value) {
-          if (__g_last1 - __g_first1 != __g_last2 - __g_first2)
-            return false;
-          return std::__equal(
-              __policy, std::move(__g_first1), std::move(__g_last1), std::move(__g_first2), std::move(__g_pred));
-        } else {
-          (void)__policy; // Avoid unused lambda capture warning
-          return std::equal(
-              std::move(__g_first1),
-              std::move(__g_last1),
-              std::move(__g_first2),
-              std::move(__g_last2),
-              std::move(__g_pred));
-        }
-      },
+  using _Implementation = __pstl::__dispatch<__pstl::__equal_3leg, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
       std::move(__first1),
       std::move(__last1),
       std::move(__first2),
-      std::move(__last2),
-      std::move(__pred));
+      equal_to{});
 }
 
 template <class _ExecutionPolicy,
@@ -587,17 +194,21 @@ equal(_ExecutionPolicy&& __policy,
       _Pred __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "equal requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "equal requires ForwardIterators");
-  auto __res = std::__equal(
-      __policy, std::move(__first1), std::move(__last1), std::move(__first2), std::move(__last2), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
+  using _Implementation = __pstl::__dispatch<__pstl::__equal, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__last2),
+      std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
           class _ForwardIterator1,
           class _ForwardIterator2,
-          enable_if_t<is_execution_policy_v<__remove_cvref_t<_ExecutionPolicy>>, int> = 0>
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
 _LIBCPP_HIDE_FROM_ABI bool
 equal(_ExecutionPolicy&& __policy,
       _ForwardIterator1 __first1,
@@ -606,33 +217,14 @@ equal(_ExecutionPolicy&& __policy,
       _ForwardIterator2 __last2) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "equal requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "equal requires ForwardIterators");
-  auto __res = std::__equal(
-      __policy, std::move(__first1), std::move(__last1), std::move(__first2), std::move(__last2), std::equal_to{});
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
-}
-
-template <class>
-void __pstl_fill(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI optional<__empty> __fill(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, const _Tp& __value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_fill, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, const _Tp& __g_value) {
-        return std::__for_each(__policy, __g_first, __g_last, [&](__iter_reference<_ForwardIterator> __element) {
-          __element = __g_value;
-        });
-      },
-      std::forward<_ForwardIterator>(__first),
-      std::forward<_ForwardIterator>(__last),
-      __value);
+  using _Implementation = __pstl::__dispatch<__pstl::__equal, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__last2),
+      equal_to{});
 }
 
 template <class _ExecutionPolicy,
@@ -643,70 +235,36 @@ template <class _ExecutionPolicy,
 _LIBCPP_HIDE_FROM_ABI void
 fill(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "fill requires ForwardIterators");
-  if (!std::__fill(__policy, std::move(__first), std::move(__last), __value))
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_fill_n(); // declaration needed for the frontend dispatch below
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _SizeT,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__fill_n(_ExecutionPolicy&& __policy, _ForwardIterator&& __first, _SizeT&& __n, const _Tp& __value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_fill_n, _RawPolicy),
-      [&](_ForwardIterator __g_first, _SizeT __g_n, const _Tp& __g_value) {
-        if constexpr (__has_random_access_iterator_category_or_concept<_ForwardIterator>::value)
-          std::fill(__policy, __g_first, __g_first + __g_n, __g_value);
-        else
-          std::fill_n(__g_first, __g_n, __g_value);
-        return optional<__empty>{__empty{}};
-      },
-      std::move(__first),
-      std::move(__n),
-      __value);
+  using _Implementation = __pstl::__dispatch<__pstl::__fill, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __value);
 }
 
 template <class _ExecutionPolicy,
           class _ForwardIterator,
-          class _SizeT,
+          class _Size,
           class _Tp,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
 _LIBCPP_HIDE_FROM_ABI void
-fill_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _SizeT __n, const _Tp& __value) {
+fill_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __n, const _Tp& __value) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "fill_n requires ForwardIterators");
-  if (!std::__fill_n(__policy, std::move(__first), std::move(__n), __value))
-    std::__throw_bad_alloc();
+  using _Implementation = __pstl::__dispatch<__pstl::__fill_n, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__n), __value);
 }
 
-template <class>
-void __pstl_find_if_not();
-
 template <class _ExecutionPolicy,
           class _ForwardIterator,
           class _Predicate,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__remove_cvref_t<_ForwardIterator>> __find_if_not(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Predicate&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_find_if_not, _RawPolicy),
-      [&](_ForwardIterator&& __g_first, _ForwardIterator&& __g_last, _Predicate&& __g_pred)
-          -> optional<__remove_cvref_t<_ForwardIterator>> {
-        return std::__find_if(
-            __policy, __g_first, __g_last, [&](__iter_reference<__remove_cvref_t<_ForwardIterator>> __value) {
-              return !__g_pred(__value);
-            });
-      },
-      std::forward<_ForwardIterator>(__first),
-      std::forward<_ForwardIterator>(__last),
-      std::forward<_Predicate>(__pred));
+_LIBCPP_HIDE_FROM_ABI _ForwardIterator
+find_if(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "find_if requires ForwardIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__find_if, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -717,33 +275,9 @@ template <class _ExecutionPolicy,
 _LIBCPP_HIDE_FROM_ABI _ForwardIterator
 find_if_not(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "find_if_not requires ForwardIterators");
-  auto __res = std::__find_if_not(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_find();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__remove_cvref_t<_ForwardIterator>> __find(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, const _Tp& __value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_find, _RawPolicy),
-      [&](_ForwardIterator __g_first, _ForwardIterator __g_last, const _Tp& __g_value) -> optional<_ForwardIterator> {
-        return std::find_if(
-            __policy, __g_first, __g_last, [&](__iter_reference<__remove_cvref_t<_ForwardIterator>> __element) {
-              return __element == __g_value;
-            });
-      },
-      std::forward<_ForwardIterator>(__first),
-      std::forward<_ForwardIterator>(__last),
-      __value);
+  using _Implementation = __pstl::__dispatch<__pstl::__find_if_not, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -754,37 +288,22 @@ template <class _ExecutionPolicy,
 _LIBCPP_HIDE_FROM_ABI _ForwardIterator
 find(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "find requires ForwardIterators");
-  auto __res = std::__find(__policy, std::move(__first), std::move(__last), __value);
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
+  using _Implementation = __pstl::__dispatch<__pstl::__find, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __value);
 }
 
-template <class>
-void __pstl_for_each_n(); // declaration needed for the frontend dispatch below
-
 template <class _ExecutionPolicy,
           class _ForwardIterator,
-          class _Size,
           class _Function,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__for_each_n(_ExecutionPolicy&& __policy, _ForwardIterator&& __first, _Size&& __size, _Function&& __func) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_for_each_n, _RawPolicy),
-      [&](_ForwardIterator __g_first, _Size __g_size, _Function __g_func) -> optional<__empty> {
-        if constexpr (__has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-          std::for_each(__policy, std::move(__g_first), __g_first + __g_size, std::move(__g_func));
-          return __empty{};
-        } else {
-          std::for_each_n(std::move(__g_first), __g_size, std::move(__g_func));
-          return __empty{};
-        }
-      },
-      std::move(__first),
-      std::move(__size),
-      std::move(__func));
+_LIBCPP_HIDE_FROM_ABI void
+for_each(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Function __func) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "for_each requires ForwardIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__for_each, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__func));
 }
 
 template <class _ExecutionPolicy,
@@ -796,32 +315,9 @@ template <class _ExecutionPolicy,
 _LIBCPP_HIDE_FROM_ABI void
 for_each_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __size, _Function __func) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "for_each_n requires a ForwardIterator");
-  auto __res = std::__for_each_n(__policy, std::move(__first), std::move(__size), std::move(__func));
-  if (!__res)
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_generate();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Generator,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty> __generate(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Generator&& __gen) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_generate, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _ForwardIterator __g_last, _Generator __g_gen) {
-        return std::__for_each(
-            __policy, std::move(__g_first), std::move(__g_last), [&](__iter_reference<_ForwardIterator> __element) {
-              __element = __g_gen();
-            });
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__gen));
+  using _Implementation = __pstl::__dispatch<__pstl::__for_each_n, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__size), std::move(__func));
 }
 
 template <class _ExecutionPolicy,
@@ -832,32 +328,9 @@ template <class _ExecutionPolicy,
 _LIBCPP_HIDE_FROM_ABI void
 generate(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Generator __gen) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "generate requires ForwardIterators");
-  if (!std::__generate(__policy, std::move(__first), std::move(__last), std::move(__gen)))
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_generate_n();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Size,
-          class _Generator,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__generate_n(_ExecutionPolicy&& __policy, _ForwardIterator&& __first, _Size&& __n, _Generator&& __gen) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_generate_n, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _Size __g_n, _Generator __g_gen) {
-        return std::__for_each_n(
-            __policy, std::move(__g_first), std::move(__g_n), [&](__iter_reference<_ForwardIterator> __element) {
-              __element = __g_gen();
-            });
-      },
-      std::move(__first),
-      __n,
-      std::move(__gen));
+  using _Implementation = __pstl::__dispatch<__pstl::__generate, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__gen));
 }
 
 template <class _ExecutionPolicy,
@@ -869,32 +342,9 @@ template <class _ExecutionPolicy,
 _LIBCPP_HIDE_FROM_ABI void
 generate_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __n, _Generator __gen) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "generate_n requires a ForwardIterator");
-  if (!std::__generate_n(__policy, std::move(__first), std::move(__n), std::move(__gen)))
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_is_partitioned();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Predicate,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool> __is_partitioned(
-    _ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last, _Predicate&& __pred) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_is_partitioned, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _ForwardIterator __g_last, _Predicate __g_pred) {
-        __g_first = std::find_if_not(__policy, __g_first, __g_last, __g_pred);
-        if (__g_first == __g_last)
-          return true;
-        ++__g_first;
-        return std::none_of(__policy, __g_first, __g_last, __g_pred);
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__pred));
+  using _Implementation = __pstl::__dispatch<__pstl::__generate_n, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__n), std::move(__gen));
 }
 
 template <class _ExecutionPolicy,
@@ -905,10 +355,9 @@ template <class _ExecutionPolicy,
 _LIBCPP_NODISCARD _LIBCPP_HIDE_FROM_ABI bool
 is_partitioned(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "is_partitioned requires ForwardIterators");
-  auto __res = std::__is_partitioned(__policy, std::move(__first), std::move(__last), std::move(__pred));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
+  using _Implementation = __pstl::__dispatch<__pstl::__is_partitioned, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
 }
 
 template <class _ExecutionPolicy,
@@ -918,32 +367,6 @@ template <class _ExecutionPolicy,
           class _Comp,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
-__merge(_ExecutionPolicy&&,
-        _ForwardIterator1&& __first1,
-        _ForwardIterator1&& __last1,
-        _ForwardIterator2&& __first2,
-        _ForwardIterator2&& __last2,
-        _ForwardOutIterator&& __result,
-        _Comp&& __comp) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_merge<_RawPolicy>(
-      _Backend{},
-      std::forward<_ForwardIterator1>(__first1),
-      std::forward<_ForwardIterator1>(__last1),
-      std::forward<_ForwardIterator2>(__first2),
-      std::forward<_ForwardIterator2>(__last2),
-      std::forward<_ForwardOutIterator>(__result),
-      std::forward<_Comp>(__comp));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _ForwardOutIterator,
-          class _Comp                                         = std::less<>,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
 _LIBCPP_HIDE_FROM_ABI _ForwardOutIterator
 merge(_ExecutionPolicy&& __policy,
       _ForwardIterator1 __first1,
@@ -951,50 +374,48 @@ merge(_ExecutionPolicy&& __policy,
       _ForwardIterator2 __first2,
       _ForwardIterator2 __last2,
       _ForwardOutIterator __result,
-      _Comp __comp = {}) {
+      _Comp __comp) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "merge requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "merge requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, decltype(*__first1), "merge requires an OutputIterator");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, decltype(*__first2), "merge requires an OutputIterator");
-  auto __res = std::__merge(
-      __policy,
+  using _Implementation = __pstl::__dispatch<__pstl::__merge, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
       std::move(__first1),
       std::move(__last1),
       std::move(__first2),
       std::move(__last2),
       std::move(__result),
       std::move(__comp));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
 }
 
-// TODO: Use the std::copy/move shenanigans to forward to std::memmove
-//       Investigate whether we want to still forward to std::transform(policy)
-//       in that case for the execution::par part, or whether we actually want
-//       to run everything serially in that case.
-
-template <class>
-void __pstl_move();
-
 template <class _ExecutionPolicy,
-          class _ForwardIterator,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
           class _ForwardOutIterator,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
-__move(_ExecutionPolicy&& __policy,
-       _ForwardIterator&& __first,
-       _ForwardIterator&& __last,
-       _ForwardOutIterator&& __result) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_move, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _ForwardIterator __g_last, _ForwardOutIterator __g_result) {
-        return std::__transform(__policy, __g_first, __g_last, __g_result, [](auto&& __v) { return std::move(__v); });
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__result));
+_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator
+merge(_ExecutionPolicy&& __policy,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _ForwardIterator2 __last2,
+      _ForwardOutIterator __result) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "merge requires ForwardIterators");
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "merge requires ForwardIterators");
+  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, decltype(*__first1), "merge requires an OutputIterator");
+  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, decltype(*__first2), "merge requires an OutputIterator");
+  using _Implementation = __pstl::__dispatch<__pstl::__merge, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__last2),
+      std::move(__result),
+      less{});
 }
 
 template <class _ExecutionPolicy,
@@ -1008,41 +429,9 @@ move(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __l
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator, "move requires an OutputIterator");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
       _ForwardOutIterator, decltype(std::move(*__first)), "move requires an OutputIterator");
-  auto __res = std::__move(__policy, std::move(__first), std::move(__last), std::move(__result));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
-}
-
-template <class>
-void __pstl_replace_if();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Pred,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__replace_if(_ExecutionPolicy&& __policy,
-             _ForwardIterator&& __first,
-             _ForwardIterator&& __last,
-             _Pred&& __pred,
-             const _Tp& __new_value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_replace_if, _RawPolicy),
-      [&__policy](
-          _ForwardIterator&& __g_first, _ForwardIterator&& __g_last, _Pred&& __g_pred, const _Tp& __g_new_value) {
-        std::for_each(__policy, __g_first, __g_last, [&](__iter_reference<_ForwardIterator> __element) {
-          if (__g_pred(__element))
-            __element = __g_new_value;
-        });
-        return optional<__empty>{__empty{}};
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__pred),
-      __new_value);
+  using _Implementation = __pstl::__dispatch<__pstl::__move, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__result));
 }
 
 template <class _ExecutionPolicy,
@@ -1058,40 +447,9 @@ replace_if(_ExecutionPolicy&& __policy,
            _Pred __pred,
            const _Tp& __new_value) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "replace_if requires ForwardIterators");
-  auto __res = std::__replace_if(__policy, std::move(__first), std::move(__last), std::move(__pred), __new_value);
-  if (!__res)
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_replace();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__replace(_ExecutionPolicy&& __policy,
-          _ForwardIterator&& __first,
-          _ForwardIterator&& __last,
-          const _Tp& __old_value,
-          const _Tp& __new_value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_replace, _RawPolicy),
-      [&__policy](
-          _ForwardIterator __g_first, _ForwardIterator __g_last, const _Tp& __g_old_value, const _Tp& __g_new_value) {
-        return std::__replace_if(
-            __policy,
-            std::move(__g_first),
-            std::move(__g_last),
-            [&](__iter_reference<_ForwardIterator> __element) { return __element == __g_old_value; },
-            __g_new_value);
-      },
-      std::forward<_ForwardIterator>(__first),
-      std::forward<_ForwardIterator>(__last),
-      __old_value,
-      __new_value);
+  using _Implementation = __pstl::__dispatch<__pstl::__replace_if, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred), __new_value);
 }
 
 template <class _ExecutionPolicy,
@@ -1106,46 +464,9 @@ replace(_ExecutionPolicy&& __policy,
         const _Tp& __old_value,
         const _Tp& __new_value) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "replace requires ForwardIterators");
-  if (!std::__replace(__policy, std::move(__first), std::move(__last), __old_value, __new_value))
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_replace_copy_if();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _Pred,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty> __replace_copy_if(
-    _ExecutionPolicy&& __policy,
-    _ForwardIterator&& __first,
-    _ForwardIterator&& __last,
-    _ForwardOutIterator&& __result,
-    _Pred&& __pred,
-    const _Tp& __new_value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_replace_copy_if, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first,
-                  _ForwardIterator __g_last,
-                  _ForwardOutIterator __g_result,
-                  _Pred __g_pred,
-                  const _Tp& __g_new_value) -> optional<__empty> {
-        if (!std::__transform(
-                __policy, __g_first, __g_last, __g_result, [&](__iter_reference<_ForwardIterator> __element) {
-                  return __g_pred(__element) ? __g_new_value : __element;
-                }))
-          return nullopt;
-        return __empty{};
-      },
-      std::move(__first),
-      std::move(__last),
-      std::move(__result),
-      std::move(__pred),
-      __new_value);
+  using _Implementation = __pstl::__dispatch<__pstl::__replace, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __old_value, __new_value);
 }
 
 template <class _ExecutionPolicy,
@@ -1167,46 +488,13 @@ _LIBCPP_HIDE_FROM_ABI void replace_copy_if(
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
       _ForwardOutIterator, decltype(*__first), "replace_copy_if requires an OutputIterator");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, const _Tp&, "replace_copy requires an OutputIterator");
-  if (!std::__replace_copy_if(
-          __policy, std::move(__first), std::move(__last), std::move(__result), std::move(__pred), __new_value))
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_replace_copy();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _Tp,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty> __replace_copy(
-    _ExecutionPolicy&& __policy,
-    _ForwardIterator&& __first,
-    _ForwardIterator&& __last,
-    _ForwardOutIterator&& __result,
-    const _Tp& __old_value,
-    const _Tp& __new_value) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_replace_copy, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first,
-                  _ForwardIterator __g_last,
-                  _ForwardOutIterator __g_result,
-                  const _Tp& __g_old_value,
-                  const _Tp& __g_new_value) {
-        return std::__replace_copy_if(
-            __policy,
-            std::move(__g_first),
-            std::move(__g_last),
-            std::move(__g_result),
-            [&](__iter_reference<_ForwardIterator> __element) { return __element == __g_old_value; },
-            __g_new_value);
-      },
+  using _Implementation = __pstl::__dispatch<__pstl::__replace_copy_if, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
       std::move(__first),
       std::move(__last),
       std::move(__result),
-      __old_value,
+      std::move(__pred),
       __new_value);
 }
 
@@ -1228,41 +516,14 @@ _LIBCPP_HIDE_FROM_ABI void replace_copy(
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
       _ForwardOutIterator, decltype(*__first), "replace_copy requires an OutputIterator");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, const _Tp&, "replace_copy requires an OutputIterator");
-  if (!std::__replace_copy(
-          __policy, std::move(__first), std::move(__last), std::move(__result), __old_value, __new_value))
-    std::__throw_bad_alloc();
-}
-
-template <class>
-void __pstl_rotate_copy();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _ForwardOutIterator,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
-__rotate_copy(_ExecutionPolicy&& __policy,
-              _ForwardIterator&& __first,
-              _ForwardIterator&& __middle,
-              _ForwardIterator&& __last,
-              _ForwardOutIterator&& __result) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_rotate_copy, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first,
-                  _ForwardIterator __g_middle,
-                  _ForwardIterator __g_last,
-                  _ForwardOutIterator __g_result) -> optional<_ForwardOutIterator> {
-        auto __result_mid =
-            std::__copy(__policy, _ForwardIterator(__g_middle), std::move(__g_last), std::move(__g_result));
-        if (!__result_mid)
-          return nullopt;
-        return std::__copy(__policy, std::move(__g_first), std::move(__g_middle), *std::move(__result_mid));
-      },
+  using _Implementation = __pstl::__dispatch<__pstl::__replace_copy, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
       std::move(__first),
-      std::move(__middle),
       std::move(__last),
-      std::move(__result));
+      std::move(__result),
+      __old_value,
+      __new_value);
 }
 
 template <class _ExecutionPolicy,
@@ -1280,81 +541,117 @@ _LIBCPP_HIDE_FROM_ABI _ForwardOutIterator rotate_copy(
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator, "rotate_copy requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
       _ForwardOutIterator, decltype(*__first), "rotate_copy requires an OutputIterator");
-  auto __res =
-      std::__rotate_copy(__policy, std::move(__first), std::move(__middle), std::move(__last), std::move(__result));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *__res;
+  using _Implementation = __pstl::__dispatch<__pstl::__rotate_copy, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first),
+      std::move(__middle),
+      std::move(__last),
+      std::move(__result));
 }
 
 template <class _ExecutionPolicy,
           class _RandomAccessIterator,
-          class _Comp                                         = less<>,
+          class _Comp,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty> __stable_sort(
-    _ExecutionPolicy&&, _RandomAccessIterator&& __first, _RandomAccessIterator&& __last, _Comp&& __comp = {}) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_stable_sort<_RawPolicy>(_Backend{}, std::move(__first), std::move(__last), std::move(__comp));
+_LIBCPP_HIDE_FROM_ABI void
+sort(_ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp) {
+  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "sort requires RandomAccessIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__sort, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__comp));
 }
 
 template <class _ExecutionPolicy,
           class _RandomAccessIterator,
-          class _Comp                                         = less<>,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI void stable_sort(
-    _ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp = {}) {
-  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "stable_sort requires RandomAccessIterators");
-  if (!std::__stable_sort(__policy, std::move(__first), std::move(__last), std::move(__comp)))
-    std::__throw_bad_alloc();
+_LIBCPP_HIDE_FROM_ABI void
+sort(_ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last) {
+  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "sort requires RandomAccessIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__sort, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), less{});
 }
 
-template <class>
-void __pstl_sort();
-
 template <class _ExecutionPolicy,
           class _RandomAccessIterator,
           class _Comp,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
-__sort(_ExecutionPolicy&& __policy,
-       _RandomAccessIterator&& __first,
-       _RandomAccessIterator&& __last,
-       _Comp&& __comp) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_sort, _RawPolicy),
-      [&__policy](_RandomAccessIterator __g_first, _RandomAccessIterator __g_last, _Comp __g_comp) {
-        std::stable_sort(__policy, std::move(__g_first), std::move(__g_last), std::move(__g_comp));
-        return optional<__empty>{__empty{}};
-      },
-      std::forward<_RandomAccessIterator>(__first),
-      std::forward<_RandomAccessIterator>(__last),
-      std::forward<_Comp>(__comp));
+_LIBCPP_HIDE_FROM_ABI void
+stable_sort(_ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp) {
+  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "stable_sort requires RandomAccessIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__stable_sort, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__comp));
 }
 
 template <class _ExecutionPolicy,
           class _RandomAccessIterator,
-          class _Comp,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
 _LIBCPP_HIDE_FROM_ABI void
-sort(_ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp) {
-  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "sort requires RandomAccessIterators");
-  if (!std::__sort(__policy, std::move(__first), std::move(__last), std::move(__comp)))
-    std::__throw_bad_alloc();
+stable_sort(_ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last) {
+  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "stable_sort requires RandomAccessIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__stable_sort, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), less{});
 }
 
 template <class _ExecutionPolicy,
-          class _RandomAccessIterator,
+          class _ForwardIterator,
+          class _ForwardOutIterator,
+          class _UnaryOperation,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI void
-sort(_ExecutionPolicy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last) {
-  _LIBCPP_REQUIRE_CPP17_RANDOM_ACCESS_ITERATOR(_RandomAccessIterator, "sort requires RandomAccessIterators");
-  if (!std::__sort(__policy, std::move(__first), std::move(__last), less{}))
-    std::__throw_bad_alloc();
+_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator transform(
+    _ExecutionPolicy&& __policy,
+    _ForwardIterator __first,
+    _ForwardIterator __last,
+    _ForwardOutIterator __result,
+    _UnaryOperation __op) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "transform requires ForwardIterators");
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator, "transform requires an OutputIterator");
+  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
+      _ForwardOutIterator, decltype(__op(*__first)), "transform requires an OutputIterator");
+  using _Implementation = __pstl::__dispatch<__pstl::__transform, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first),
+      std::move(__last),
+      std::move(__result),
+      std::move(__op));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
+          class _ForwardOutIterator,
+          class _BinaryOperation,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator transform(
+    _ExecutionPolicy&& __policy,
+    _ForwardIterator1 __first1,
+    _ForwardIterator1 __last1,
+    _ForwardIterator2 __first2,
+    _ForwardOutIterator __result,
+    _BinaryOperation __op) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "transform requires ForwardIterators");
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "transform requires ForwardIterators");
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator, "transform requires an OutputIterator");
+  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(
+      _ForwardOutIterator, decltype(__op(*__first1, *__first2)), "transform requires an OutputIterator");
+  using _Implementation = __pstl::__dispatch<__pstl::__transform_binary, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__result),
+      std::move(__op));
 }
 
 _LIBCPP_END_NAMESPACE_STD
diff --git a/libcxx/include/__algorithm/pstl_frontend_dispatch.h b/libcxx/include/__algorithm/pstl_frontend_dispatch.h
deleted file mode 100644
index 6fa1107491154..0000000000000
--- a/libcxx/include/__algorithm/pstl_frontend_dispatch.h
+++ /dev/null
@@ -1,44 +0,0 @@
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef _LIBCPP___ALGORITHM_PSTL_FRONTEND_DISPATCH
-#define _LIBCPP___ALGORITHM_PSTL_FRONTEND_DISPATCH
-
-#include <__config>
-#include <__type_traits/is_callable.h>
-#include <__utility/forward.h>
-
-#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
-#  pragma GCC system_header
-#endif
-
-#if _LIBCPP_STD_VER >= 17
-
-_LIBCPP_BEGIN_NAMESPACE_STD
-
-#  define _LIBCPP_PSTL_CUSTOMIZATION_POINT(name, policy)                                                               \
-    [](auto&&... __args) -> decltype(std::name<policy>(                                                                \
-                             typename __select_backend<policy>::type{}, std::forward<decltype(__args)>(__args)...)) {  \
-      return std::name<policy>(typename __select_backend<policy>::type{}, std::forward<decltype(__args)>(__args)...);  \
-    }
-
-template <class _SpecializedImpl, class _GenericImpl, class... _Args>
-_LIBCPP_HIDE_FROM_ABI decltype(auto)
-__pstl_frontend_dispatch(_SpecializedImpl __specialized_impl, _GenericImpl __generic_impl, _Args&&... __args) {
-  if constexpr (__is_callable<_SpecializedImpl, _Args...>::value) {
-    return __specialized_impl(std::forward<_Args>(__args)...);
-  } else {
-    return __generic_impl(std::forward<_Args>(__args)...);
-  }
-}
-
-_LIBCPP_END_NAMESPACE_STD
-
-#endif // _LIBCPP_STD_VER >= 17
-
-#endif // _LIBCPP___ALGORITHM_PSTL_FRONTEND_DISPATCH
diff --git a/libcxx/include/__numeric/pstl.h b/libcxx/include/__numeric/pstl.h
index 05559b4d3f3c8..835576280dfea 100644
--- a/libcxx/include/__numeric/pstl.h
+++ b/libcxx/include/__numeric/pstl.h
@@ -9,17 +9,20 @@
 #ifndef _LIBCPP___NUMERIC_PSTL_H
 #define _LIBCPP___NUMERIC_PSTL_H
 
-#include <__algorithm/pstl_frontend_dispatch.h>
 #include <__config>
 #include <__functional/identity.h>
 #include <__functional/operations.h>
 #include <__iterator/cpp17_iterator_concepts.h>
 #include <__iterator/iterator_traits.h>
-#include <__numeric/transform_reduce.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/configuration.h>
+#include <__pstl/dispatch.h>
+#include <__pstl/run_backend.h>
+#include <__type_traits/enable_if.h>
 #include <__type_traits/is_execution_policy.h>
+#include <__type_traits/remove_cvref.h>
+#include <__utility/forward.h>
 #include <__utility/move.h>
-#include <optional>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
@@ -33,30 +36,50 @@ _LIBCPP_PUSH_MACROS
 _LIBCPP_BEGIN_NAMESPACE_STD
 
 template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
+          class _ForwardIterator,
           class _Tp,
-          class _BinaryOperation1,
-          class _BinaryOperation2,
+          class _BinaryOperation,
           class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
           enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI optional<_Tp> __transform_reduce(
-    _ExecutionPolicy&&,
-    _ForwardIterator1&& __first1,
-    _ForwardIterator1&& __last1,
-    _ForwardIterator2&& __first2,
-    _Tp&& __init,
-    _BinaryOperation1&& __reduce,
-    _BinaryOperation2&& __transform) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_transform_reduce<_RawPolicy>(
-      _Backend{},
-      std::move(__first1),
-      std::move(__last1),
-      std::move(__first2),
+_LIBCPP_HIDE_FROM_ABI _Tp reduce(
+    _ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Tp __init, _BinaryOperation __op) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "reduce requires ForwardIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__reduce, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first),
+      std::move(__last),
       std::move(__init),
-      std::move(__reduce),
-      std::move(__transform));
+      std::move(__op));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Tp,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _Tp
+reduce(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Tp __init) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "reduce requires ForwardIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__reduce, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__init), plus{});
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI __iter_value_type<_ForwardIterator>
+reduce(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "reduce requires ForwardIterators");
+  using _Implementation = __pstl::__dispatch<__pstl::__reduce, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first),
+      std::move(__last),
+      __iter_value_type<_ForwardIterator>(),
+      plus{});
 }
 
 template <class _ExecutionPolicy,
@@ -77,18 +100,16 @@ _LIBCPP_HIDE_FROM_ABI _Tp transform_reduce(
     _BinaryOperation2 __transform) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "transform_reduce requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "transform_reduce requires ForwardIterators");
-  auto __res = std::__transform_reduce(
-      __policy,
+  using _Implementation =
+      __pstl::__dispatch<__pstl::__transform_reduce_binary, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
       std::move(__first1),
       std::move(__last1),
       std::move(__first2),
       std::move(__init),
       std::move(__reduce),
       std::move(__transform));
-
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
 }
 
 // This overload doesn't get a customization point because it's trivial to detect (through e.g.
@@ -97,7 +118,8 @@ template <class _ExecutionPolicy,
           class _ForwardIterator1,
           class _ForwardIterator2,
           class _Tp,
-          enable_if_t<is_execution_policy_v<__remove_cvref_t<_ExecutionPolicy>>, int> = 0>
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
 _LIBCPP_HIDE_FROM_ABI _Tp transform_reduce(
     _ExecutionPolicy&& __policy,
     _ForwardIterator1 __first1,
@@ -106,31 +128,16 @@ _LIBCPP_HIDE_FROM_ABI _Tp transform_reduce(
     _Tp __init) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1, "transform_reduce requires ForwardIterators");
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2, "transform_reduce requires ForwardIterators");
-  return std::transform_reduce(__policy, __first1, __last1, __first2, __init, plus{}, multiplies{});
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _BinaryOperation,
-          class _UnaryOperation,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__remove_cvref_t<_Tp>> __transform_reduce(
-    _ExecutionPolicy&&,
-    _ForwardIterator&& __first,
-    _ForwardIterator&& __last,
-    _Tp&& __init,
-    _BinaryOperation&& __reduce,
-    _UnaryOperation&& __transform) noexcept {
-  using _Backend = typename __select_backend<_RawPolicy>::type;
-  return std::__pstl_transform_reduce<_RawPolicy>(
-      _Backend{},
-      std::move(__first),
-      std::move(__last),
+  using _Implementation =
+      __pstl::__dispatch<__pstl::__transform_reduce_binary, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
       std::move(__init),
-      std::move(__reduce),
-      std::move(__transform));
+      plus{},
+      multiplies{});
 }
 
 template <class _ExecutionPolicy,
@@ -148,86 +155,14 @@ _LIBCPP_HIDE_FROM_ABI _Tp transform_reduce(
     _BinaryOperation __reduce,
     _UnaryOperation __transform) {
   _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "transform_reduce requires ForwardIterators");
-  auto __res = std::__transform_reduce(
-      __policy, std::move(__first), std::move(__last), std::move(__init), std::move(__reduce), std::move(__transform));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class>
-void __pstl_reduce();
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _BinaryOperation                              = plus<>,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_Tp>
-__reduce(_ExecutionPolicy&& __policy,
-         _ForwardIterator&& __first,
-         _ForwardIterator&& __last,
-         _Tp&& __init,
-         _BinaryOperation&& __op = {}) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_reduce, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _ForwardIterator __g_last, _Tp __g_init, _BinaryOperation __g_op) {
-        return std::__transform_reduce(
-            __policy, std::move(__g_first), std::move(__g_last), std::move(__g_init), std::move(__g_op), __identity{});
-      },
+  using _Implementation = __pstl::__dispatch<__pstl::__transform_reduce, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
       std::move(__first),
       std::move(__last),
       std::move(__init),
-      std::move(__op));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _Tp,
-          class _BinaryOperation                              = plus<>,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI _Tp
-reduce(_ExecutionPolicy&& __policy,
-       _ForwardIterator __first,
-       _ForwardIterator __last,
-       _Tp __init,
-       _BinaryOperation __op = {}) {
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "reduce requires ForwardIterators");
-  auto __res = std::__reduce(__policy, std::move(__first), std::move(__last), std::move(__init), std::move(__op));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__iter_value_type<_ForwardIterator>>
-__reduce(_ExecutionPolicy&& __policy, _ForwardIterator&& __first, _ForwardIterator&& __last) noexcept {
-  return std::__pstl_frontend_dispatch(
-      _LIBCPP_PSTL_CUSTOMIZATION_POINT(__pstl_reduce, _RawPolicy),
-      [&__policy](_ForwardIterator __g_first, _ForwardIterator __g_last) {
-        return std::__reduce(
-            __policy, std::move(__g_first), std::move(__g_last), __iter_value_type<_ForwardIterator>());
-      },
-      std::move(__first),
-      std::move(__last));
-}
-
-template <class _ExecutionPolicy,
-          class _ForwardIterator,
-          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
-          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI __iter_value_type<_ForwardIterator>
-reduce(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last) {
-  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator, "reduce requires ForwardIterators");
-  auto __res = std::__reduce(__policy, std::move(__first), std::move(__last));
-  if (!__res)
-    std::__throw_bad_alloc();
-  return *std::move(__res);
+      std::move(__reduce),
+      std::move(__transform));
 }
 
 _LIBCPP_END_NAMESPACE_STD
diff --git a/libcxx/include/__pstl/README.md b/libcxx/include/__pstl/README.md
new file mode 100644
index 0000000000000..9fa5223d03e10
--- /dev/null
+++ b/libcxx/include/__pstl/README.md
@@ -0,0 +1,171 @@
+TODO: Documentation of how backends work
+
+A PSTL parallel backend is a tag type to which the following functions are associated, at minimum:
+
+```c++
+template <class _ExecutionPolicy, class _Iterator, class _Func>
+optional<__empty> __pstl_for_each(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Func __f);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<_Iterator> __pstl_find_if(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _RandomAccessIterator, class _Comp>
+optional<__empty>
+__pstl_stable_sort(_Backend, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp);
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
+          class _ForwardOutIterator,
+          class _Comp>
+optional<_ForwardOutIterator> __pstl_merge(_Backend,
+                                            _ForwardIterator1 __first1,
+                                            _ForwardIterator1 __last1,
+                                            _ForwardIterator2 __first2,
+                                            _ForwardIterator2 __last2,
+                                            _ForwardOutIterator __result,
+                                            _Comp __comp);
+
+template <class _ExecutionPolicy, class _InIterator, class _OutIterator, class _UnaryOperation>
+optional<_OutIterator>
+__pstl_transform(_Backend, _InIterator __first, _InIterator __last, _OutIterator __result, _UnaryOperation __op);
+
+template <class _ExecutionPolicy, class _InIterator1, class _InIterator2, class _OutIterator, class _BinaryOperation>
+optional<_OutIterator> __pstl_transform(_InIterator1 __first1,
+                                        _InIterator2 __first2,
+                                        _InIterator1 __last1,
+                                        _OutIterator __result,
+                                        _BinaryOperation __op);
+
+template <class _ExecutionPolicy,
+          class _Iterator1,
+          class _Iterator2,
+          class _Tp,
+          class _BinaryOperation1,
+          class _BinaryOperation2>
+optional<_Tp> __pstl_transform_reduce(_Backend,
+                                      _Iterator1 __first1,
+                                      _Iterator1 __last1,
+                                      _Iterator2 __first2,
+                                      _Iterator2 __last2,
+                                      _Tp __init,
+                                      _BinaryOperation1 __reduce,
+                                      _BinaryOperation2 __transform);
+
+template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinaryOperation, class _UnaryOperation>
+optional<_Tp> __pstl_transform_reduce(_Backend,
+                                      _Iterator __first,
+                                      _Iterator __last,
+                                      _Tp __init,
+                                      _BinaryOperation __reduce,
+                                      _UnaryOperation __transform);
+
+// TODO: Complete this list
+```
+
+The following functions are optional but can be provided. If provided, they are used by the corresponding
+algorithms, otherwise they are implemented in terms of other algorithms. If none of the optional algorithms are
+implemented, all the algorithms will eventually forward to the basis algorithms listed above:
+
+```c++
+template <class _ExecutionPolicy, class _Iterator, class _Size, class _Func>
+optional<__empty> __pstl_for_each_n(_Backend, _Iterator __first, _Size __n, _Func __f);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<bool> __pstl_any_of(_Backend, _Iterator __first, _iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<bool> __pstl_all_of(_Backend, _Iterator __first, _iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<bool> __pstl_none_of(_Backend, _Iterator __first, _iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _Iterator, class _Tp>
+optional<_Iterator> __pstl_find(_Backend, _Iterator __first, _Iterator __last, const _Tp& __value);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<_Iterator> __pstl_find_if_not(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _Iterator, class _Tp>
+optional<__empty> __pstl_fill(_Backend, _Iterator __first, _Iterator __last, const _Tp& __value);
+
+template <class _ExecutionPolicy, class _Iterator, class _SizeT, class _Tp>
+optional<__empty> __pstl_fill_n(_Backend, _Iterator __first, _SizeT __n, const _Tp& __value);
+
+template <class _ExecutionPolicy, class _Iterator, class _Generator>
+optional<__empty> __pstl_generate(_Backend, _Iterator __first, _Iterator __last, _Generator __gen);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<__empty> __pstl_is_partitioned(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _Iterator, class _Size, class _Generator>
+optional<__empty> __pstl_generator_n(_Backend, _Iterator __first, _Size __n, _Generator __gen);
+
+template <class _ExecutionPolicy, class _terator1, class _Iterator2, class _OutIterator, class _Comp>
+optional<_OutIterator> __pstl_merge(_Backend,
+                                    _Iterator1 __first1,
+                                    _Iterator1 __last1,
+                                    _Iterator2 __first2,
+                                    _Iterator2 __last2,
+                                    _OutIterator __result,
+                                    _Comp __comp);
+
+template <class _ExecutionPolicy, class _Iterator, class _OutIterator>
+optional<_OutIterator> __pstl_move(_Backend, _Iterator __first, _Iterator __last, _OutIterator __result);
+
+template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinaryOperation>
+optional<_Tp> __pstl_reduce(_Backend, _Iterator __first, _Iterator __last, _Tp __init, _BinaryOperation __op);
+
+temlate <class _ExecutionPolicy, class _Iterator>
+optional<__iter_value_type<_Iterator>> __pstl_reduce(_Backend, _Iterator __first, _Iterator __last);
+
+template <class _ExecutionPolicy, class _Iterator, class _Tp>
+optional<__iter_diff_t<_Iterator>> __pstl_count(_Backend, _Iterator __first, _Iterator __last, const _Tp& __value);
+
+template <class _ExecutionPolicy, class _Iterator, class _Predicate>
+optional<__iter_diff_t<_Iterator>> __pstl_count_if(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
+
+template <class _ExecutionPolicy, class _Iterator, class _Tp>
+optional<__empty>
+__pstl_replace(_Backend, _Iterator __first, _Iterator __last, const _Tp& __old_value, const _Tp& __new_value);
+
+template <class _ExecutionPolicy, class _Iterator, class _Pred, class _Tp>
+optional<__empty>
+__pstl_replace_if(_Backend, _Iterator __first, _Iterator __last, _Pred __pred, const _Tp& __new_value);
+
+template <class _ExecutionPolicy, class _Iterator, class _OutIterator, class _Tp>
+optional<__empty> __pstl_replace_copy(_Backend,
+                                      _Iterator __first,
+                                      _Iterator __last,
+                                      _OutIterator __result,
+                                      const _Tp& __old_value,
+                                      const _Tp& __new_value);
+
+template <class _ExecutionPolicy, class _Iterator, class _OutIterator, class _Pred, class _Tp>
+optional<__empty> __pstl_replace_copy_if(_Backend,
+                                          _Iterator __first,
+                                          _Iterator __last,
+                                          _OutIterator __result,
+                                          _Pred __pred,
+                                          const _Tp& __new_value);
+
+template <class _ExecutionPolicy, class _Iterator, class _OutIterator>
+optional<_Iterator> __pstl_rotate_copy(
+    _Backend, _Iterator __first, _Iterator __middle, _Iterator __last, _OutIterator __result);
+
+template <class _ExecutionPolicy, class _Iterator, class _Comp>
+optional<__empty> __pstl_sort(_Backend, _Iterator __first, _Iterator __last, _Comp __comp);
+
+template <class _ExecutionPolicy, class _Iterator1, class _Iterator2, class _Comp>
+optional<bool> __pstl_equal(_Backend, _Iterator1 first1, _Iterator1 last1, _Iterator2 first2, _Comp __comp);
+
+// TODO: Complete this list
+```
+
+Exception handling
+==================
+
+PSTL backends are expected to report errors (i.e. failure to allocate) by returning a disengaged `optional` from their
+implementation. Exceptions shouldn't be used to report an internal failure-to-allocate, since all exceptions are turned
+into a program termination at the front-end level. When a backend returns a disengaged `optional` to the frontend, the
+frontend will turn that into a call to `std::__throw_bad_alloc();` to report the internal failure to the user.
diff --git a/libcxx/include/__pstl/backend_fwd.h b/libcxx/include/__pstl/backend_fwd.h
new file mode 100644
index 0000000000000..ba0bb95fcae5f
--- /dev/null
+++ b/libcxx/include/__pstl/backend_fwd.h
@@ -0,0 +1,128 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___PSTL_BACKEND_FWD_H
+#define _LIBCPP___PSTL_BACKEND_FWD_H
+
+#include <__config>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+namespace __pstl {
+
+template <class _Backend, class _ExecutionPolicy>
+struct __find_if;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __find;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __find_if_not;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __any_of;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __all_of;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __none_of;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __is_partitioned;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __for_each;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __for_each_n;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __fill;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __fill_n;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __replace;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __replace_if;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __generate;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __generate_n;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __merge;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __stable_sort;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __sort;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __transform;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __transform_binary;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __replace_copy_if;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __replace_copy;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __move;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __copy;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __copy_n;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __rotate_copy;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __transform_reduce;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __transform_reduce_binary;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __count_if;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __count;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __equal_3leg;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __equal;
+
+template <class _Backend, class _ExecutionPolicy>
+struct __reduce;
+
+} // namespace __pstl
+_LIBCPP_END_NAMESPACE_STD
+
+_LIBCPP_POP_MACROS
+
+#endif // _LIBCPP___PSTL_BACKEND_FWD_H
diff --git a/libcxx/include/__pstl/backends/default.h b/libcxx/include/__pstl/backends/default.h
new file mode 100644
index 0000000000000..5b10a35d8327e
--- /dev/null
+++ b/libcxx/include/__pstl/backends/default.h
@@ -0,0 +1,508 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___PSTL_BACKENDS_DEFAULT_H
+#define _LIBCPP___PSTL_BACKENDS_DEFAULT_H
+
+#include <__algorithm/copy_n.h>
+#include <__algorithm/equal.h>
+#include <__algorithm/fill_n.h>
+#include <__algorithm/for_each_n.h>
+#include <__config>
+#include <__functional/identity.h>
+#include <__functional/not_fn.h>
+#include <__functional/operations.h>
+#include <__iterator/concepts.h>
+#include <__iterator/iterator_traits.h>
+#include <__pstl/backend_fwd.h>
+#include <__pstl/configuration_fwd.h>
+#include <__pstl/dispatch.h>
+#include <__utility/empty.h>
+#include <__utility/forward.h>
+#include <__utility/move.h>
+#include <optional>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+namespace __pstl {
+
+//
+// This file provides an incomplete PSTL backend that implements all of the PSTL algorithms
+// based on a smaller set of basis operations.
+//
+// It is intended as a building block for other PSTL backends that implement some operations more
+// efficiently but may not want to define the full set of PSTL algorithms.
+//
+// This backend implements all the PSTL algorithms based on the following basis operations:
+//
+// find_if family
+// --------------
+// - find
+// - find_if_not
+// - any_of
+// - all_of
+// - none_of
+// - is_partitioned
+//
+// for_each family
+// ---------------
+// - for_each_n
+// - fill
+// - fill_n
+// - replace
+// - replace_if
+// - generate
+// - generate_n
+//
+// merge family
+// ------------
+// No other algorithms based on merge
+//
+// stable_sort family
+// ------------------
+// - sort
+//
+// transform_reduce and transform_reduce_binary family
+// ---------------------------------------------------
+// - count_if
+// - count
+// - equal(3 legs)
+// - equal
+// - reduce
+//
+// transform and transform_binary family
+// -------------------------------------
+// - replace_copy_if
+// - replace_copy
+// - move
+// - copy
+// - copy_n
+// - rotate_copy
+//
+
+//////////////////////////////////////////////////////////////
+// find_if family
+//////////////////////////////////////////////////////////////
+template <class _ExecutionPolicy>
+struct __find<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardIterator>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) const noexcept {
+    using _FindIf = __dispatch<__find_if, __current_configuration, _ExecutionPolicy>;
+    return _FindIf()(
+        __policy, std::move(__first), std::move(__last), [&](__iter_reference<_ForwardIterator> __element) {
+          return __element == __value;
+        });
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __find_if_not<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardIterator>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred) const noexcept {
+    using _FindIf = __dispatch<__find_if, __current_configuration, _ExecutionPolicy>;
+    return _FindIf()(__policy, __first, __last, std::not_fn(__pred));
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __any_of<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred) const noexcept {
+    using _FindIf = __dispatch<__find_if, __current_configuration, _ExecutionPolicy>;
+    auto __res    = _FindIf()(__policy, __first, __last, __pred);
+    if (!__res)
+      return nullopt;
+    return *__res != __last;
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __all_of<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred) const noexcept {
+    using _AnyOf = __dispatch<__any_of, __current_configuration, _ExecutionPolicy>;
+    auto __res   = _AnyOf()(__policy, __first, __last, [&](__iter_reference<_ForwardIterator> __value) {
+      return !__pred(__value);
+    });
+    if (!__res)
+      return nullopt;
+    return !*__res;
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __none_of<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred) const noexcept {
+    using _AnyOf = __dispatch<__any_of, __current_configuration, _ExecutionPolicy>;
+    auto __res   = _AnyOf()(__policy, __first, __last, __pred);
+    if (!__res)
+      return nullopt;
+    return !*__res;
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __is_partitioned<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred) const noexcept {
+    using _FindIfNot   = __dispatch<__find_if_not, __current_configuration, _ExecutionPolicy>;
+    auto __maybe_first = _FindIfNot()(__policy, std::move(__first), std::move(__last), __pred);
+    if (__maybe_first == nullopt)
+      return nullopt;
+
+    __first = *__maybe_first;
+    if (__first == __last)
+      return true;
+    ++__first;
+    using _NoneOf = __dispatch<__none_of, __current_configuration, _ExecutionPolicy>;
+    return _NoneOf()(__policy, std::move(__first), std::move(__last), __pred);
+  }
+};
+
+//////////////////////////////////////////////////////////////
+// for_each family
+//////////////////////////////////////////////////////////////
+template <class _ExecutionPolicy>
+struct __for_each_n<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Size, class _Function>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _Size __size, _Function __func) const noexcept {
+    if constexpr (__has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      using _ForEach          = __dispatch<__for_each, __current_configuration, _ExecutionPolicy>;
+      _ForwardIterator __last = __first + __size;
+      return _ForEach()(__policy, std::move(__first), std::move(__last), std::move(__func));
+    } else {
+      // Otherwise, use the serial algorithm to avoid doing two passes over the input
+      std::for_each_n(std::move(__first), __size, std::move(__func));
+      return __empty{};
+    }
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __fill<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Tp const& __value) const noexcept {
+    using _ForEach = __dispatch<__for_each, __current_configuration, _ExecutionPolicy>;
+    using _Ref     = __iter_reference<_ForwardIterator>;
+    return _ForEach()(__policy, std::move(__first), std::move(__last), [&](_Ref __element) { __element = __value; });
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __fill_n<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Size, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _Size __n, _Tp const& __value) const noexcept {
+    if constexpr (__has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      using _Fill             = __dispatch<__fill, __current_configuration, _ExecutionPolicy>;
+      _ForwardIterator __last = __first + __n;
+      return _Fill()(__policy, std::move(__first), std::move(__last), __value);
+    } else {
+      // Otherwise, use the serial algorithm to avoid doing two passes over the input
+      std::fill_n(std::move(__first), __n, __value);
+      return optional<__empty>{__empty{}};
+    }
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __replace<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Tp const& __old, _Tp const& __new)
+      const noexcept {
+    using _ReplaceIf = __dispatch<__replace_if, __current_configuration, _ExecutionPolicy>;
+    using _Ref       = __iter_reference<_ForwardIterator>;
+    return _ReplaceIf()(
+        __policy, std::move(__first), std::move(__last), [&](_Ref __element) { return __element == __old; }, __new);
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __replace_if<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty> operator()(
+      _Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred, _Tp const& __new_value)
+      const noexcept {
+    using _ForEach = __dispatch<__for_each, __current_configuration, _ExecutionPolicy>;
+    using _Ref     = __iter_reference<_ForwardIterator>;
+    return _ForEach()(__policy, std::move(__first), std::move(__last), [&](_Ref __element) {
+      if (__pred(__element))
+        __element = __new_value;
+    });
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __generate<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Generator>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Generator&& __gen) const noexcept {
+    using _ForEach = __dispatch<__for_each, __current_configuration, _ExecutionPolicy>;
+    using _Ref     = __iter_reference<_ForwardIterator>;
+    return _ForEach()(__policy, std::move(__first), std::move(__last), [&](_Ref __element) { __element = __gen(); });
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __generate_n<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Size, class _Generator>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _Size __n, _Generator&& __gen) const noexcept {
+    using _ForEachN = __dispatch<__for_each_n, __current_configuration, _ExecutionPolicy>;
+    using _Ref      = __iter_reference<_ForwardIterator>;
+    return _ForEachN()(__policy, std::move(__first), __n, [&](_Ref __element) { __element = __gen(); });
+  }
+};
+
+//////////////////////////////////////////////////////////////
+// stable_sort family
+//////////////////////////////////////////////////////////////
+template <class _ExecutionPolicy>
+struct __sort<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _RandomAccessIterator, class _Comp>
+  _LIBCPP_HIDE_FROM_ABI optional<__empty> operator()(
+      _Policy&& __policy, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp&& __comp) const noexcept {
+    using _StableSort = __dispatch<__stable_sort, __current_configuration, _ExecutionPolicy>;
+    return _StableSort()(__policy, std::move(__first), std::move(__last), std::forward<_Comp>(__comp));
+  }
+};
+
+//////////////////////////////////////////////////////////////
+// transform_reduce family
+//////////////////////////////////////////////////////////////
+template <class _ExecutionPolicy>
+struct __count_if<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Predicate>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__iter_diff_t<_ForwardIterator>> operator()(
+      _Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate&& __pred) const noexcept {
+    using _TransformReduce = __dispatch<__transform_reduce, __current_configuration, _ExecutionPolicy>;
+    using _DiffT           = __iter_diff_t<_ForwardIterator>;
+    using _Ref             = __iter_reference<_ForwardIterator>;
+    return _TransformReduce()(
+        __policy, std::move(__first), std::move(__last), _DiffT{}, std::plus{}, [&](_Ref __element) -> _DiffT {
+          return __pred(__element) ? _DiffT(1) : _DiffT(0);
+        });
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __count<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__iter_diff_t<_ForwardIterator>>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Tp const& __value) const noexcept {
+    using _CountIf = __dispatch<__count_if, __current_configuration, _ExecutionPolicy>;
+    using _Ref     = __iter_reference<_ForwardIterator>;
+    return _CountIf()(__policy, std::move(__first), std::move(__last), [&](_Ref __element) -> bool {
+      return __element == __value;
+    });
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __equal_3leg<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator1, class _ForwardIterator2, class _Predicate>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy,
+             _ForwardIterator1 __first1,
+             _ForwardIterator1 __last1,
+             _ForwardIterator2 __first2,
+             _Predicate&& __pred) const noexcept {
+    using _TransformReduce = __dispatch<__transform_reduce_binary, __current_configuration, _ExecutionPolicy>;
+    return _TransformReduce()(
+        __policy,
+        std::move(__first1),
+        std::move(__last1),
+        std::move(__first2),
+        true,
+        std::logical_and{},
+        std::forward<_Predicate>(__pred));
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __equal<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator1, class _ForwardIterator2, class _Predicate>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy,
+             _ForwardIterator1 __first1,
+             _ForwardIterator1 __last1,
+             _ForwardIterator2 __first2,
+             _ForwardIterator2 __last2,
+             _Predicate&& __pred) const noexcept {
+    if constexpr (__has_random_access_iterator_category<_ForwardIterator1>::value &&
+                  __has_random_access_iterator_category<_ForwardIterator2>::value) {
+      if (__last1 - __first1 != __last2 - __first2)
+        return false;
+      // Fall back to the 3 legged algorithm
+      using _Equal3Leg = __dispatch<__equal_3leg, __current_configuration, _ExecutionPolicy>;
+      return _Equal3Leg()(
+          __policy, std::move(__first1), std::move(__last1), std::move(__first2), std::forward<_Predicate>(__pred));
+    } else {
+      // If we don't have random access, fall back to the serial algorithm cause we can't do much
+      return std::equal(
+          std::move(__first1),
+          std::move(__last1),
+          std::move(__first2),
+          std::move(__last2),
+          std::forward<_Predicate>(__pred));
+    }
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __reduce<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Tp, class _BinaryOperation>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_Tp>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Tp __init, _BinaryOperation&& __op)
+      const noexcept {
+    using _TransformReduce = __dispatch<__transform_reduce, __current_configuration, _ExecutionPolicy>;
+    return _TransformReduce()(
+        __policy,
+        std::move(__first),
+        std::move(__last),
+        std::move(__init),
+        std::forward<_BinaryOperation>(__op),
+        __identity{});
+  }
+};
+
+//////////////////////////////////////////////////////////////
+// transform family
+//////////////////////////////////////////////////////////////
+template <class _ExecutionPolicy>
+struct __replace_copy_if<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator, class _Pred, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy,
+             _ForwardIterator __first,
+             _ForwardIterator __last,
+             _ForwardOutIterator __out_it,
+             _Pred&& __pred,
+             _Tp const& __new_value) const noexcept {
+    using _Transform = __dispatch<__transform, __current_configuration, _ExecutionPolicy>;
+    using _Ref       = __iter_reference<_ForwardIterator>;
+    auto __res =
+        _Transform()(__policy, std::move(__first), std::move(__last), std::move(__out_it), [&](_Ref __element) {
+          return __pred(__element) ? __new_value : __element;
+        });
+    if (__res == nullopt)
+      return nullopt;
+    return __empty{};
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __replace_copy<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator, class _Tp>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy,
+             _ForwardIterator __first,
+             _ForwardIterator __last,
+             _ForwardOutIterator __out_it,
+             _Tp const& __old_value,
+             _Tp const& __new_value) const noexcept {
+    using _ReplaceCopyIf = __dispatch<__replace_copy_if, __current_configuration, _ExecutionPolicy>;
+    using _Ref           = __iter_reference<_ForwardIterator>;
+    return _ReplaceCopyIf()(
+        __policy,
+        std::move(__first),
+        std::move(__last),
+        std::move(__out_it),
+        [&](_Ref __element) { return __element == __old_value; },
+        __new_value);
+  }
+};
+
+// TODO: Use the std::copy/move shenanigans to forward to std::memmove
+//       Investigate whether we want to still forward to std::transform(policy)
+//       in that case for the execution::par part, or whether we actually want
+//       to run everything serially in that case.
+template <class _ExecutionPolicy>
+struct __move<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _ForwardOutIterator __out_it)
+      const noexcept {
+    using _Transform = __dispatch<__transform, __current_configuration, _ExecutionPolicy>;
+    return _Transform()(__policy, std::move(__first), std::move(__last), std::move(__out_it), [&](auto&& __element) {
+      return std::move(__element);
+    });
+  }
+};
+
+// TODO: Use the std::copy/move shenanigans to forward to std::memmove
+template <class _ExecutionPolicy>
+struct __copy<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _ForwardOutIterator __out_it)
+      const noexcept {
+    using _Transform = __dispatch<__transform, __current_configuration, _ExecutionPolicy>;
+    return _Transform()(__policy, std::move(__first), std::move(__last), std::move(__out_it), __identity());
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __copy_n<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Size, class _ForwardOutIterator>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _Size __n, _ForwardOutIterator __out_it) const noexcept {
+    if constexpr (__has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      using _Copy             = __dispatch<__copy, __current_configuration, _ExecutionPolicy>;
+      _ForwardIterator __last = __first + __n;
+      return _Copy()(__policy, std::move(__first), std::move(__last), std::move(__out_it));
+    } else {
+      // Otherwise, use the serial algorithm to avoid doing two passes over the input
+      return std::copy_n(std::move(__first), __n, std::move(__out_it));
+    }
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __rotate_copy<__default_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&& __policy,
+             _ForwardIterator __first,
+             _ForwardIterator __middle,
+             _ForwardIterator __last,
+             _ForwardOutIterator __out_it) const noexcept {
+    using _Copy       = __dispatch<__copy, __current_configuration, _ExecutionPolicy>;
+    auto __result_mid = _Copy()(__policy, __middle, std::move(__last), std::move(__out_it));
+    if (__result_mid == nullopt)
+      return nullopt;
+    return _Copy()(__policy, std::move(__first), std::move(__middle), *std::move(__result_mid));
+  }
+};
+
+} // namespace __pstl
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_POP_MACROS
+
+#endif // _LIBCPP___PSTL_BACKENDS_DEFAULT_H
diff --git a/libcxx/include/__pstl/backends/libdispatch.h b/libcxx/include/__pstl/backends/libdispatch.h
index af1da80dc133e..88942fadef8c0 100644
--- a/libcxx/include/__pstl/backends/libdispatch.h
+++ b/libcxx/include/__pstl/backends/libdispatch.h
@@ -23,8 +23,17 @@
 #include <__memory/construct_at.h>
 #include <__memory/unique_ptr.h>
 #include <__numeric/reduce.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/configuration_fwd.h>
+#include <__pstl/cpu_algos/any_of.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
+#include <__pstl/cpu_algos/fill.h>
+#include <__pstl/cpu_algos/find_if.h>
+#include <__pstl/cpu_algos/for_each.h>
+#include <__pstl/cpu_algos/merge.h>
+#include <__pstl/cpu_algos/stable_sort.h>
+#include <__pstl/cpu_algos/transform.h>
+#include <__pstl/cpu_algos/transform_reduce.h>
 #include <__utility/empty.h>
 #include <__utility/exception_guard.h>
 #include <__utility/move.h>
@@ -341,6 +350,48 @@ struct __cpu_traits<__libdispatch_backend_tag> {
   static constexpr size_t __lane_size = 64;
 };
 
+// Mandatory implementations of the computational basis
+template <class _ExecutionPolicy>
+struct __find_if<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_find_if<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __for_each<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_for_each<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __merge<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_merge<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __stable_sort<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_stable_sort<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform_binary<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform_binary<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform_reduce<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform_reduce<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform_reduce_binary<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform_reduce_binary<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+// Not mandatory, but better optimized
+template <class _ExecutionPolicy>
+struct __any_of<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_any_of<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __fill<__libdispatch_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_fill<__libdispatch_backend_tag, _ExecutionPolicy> {};
+
 } // namespace __pstl
 _LIBCPP_END_NAMESPACE_STD
 
@@ -348,14 +399,4 @@ _LIBCPP_END_NAMESPACE_STD
 
 _LIBCPP_POP_MACROS
 
-// Implement PSTL algorithms based on the __cpu_traits specialized above
-#include <__pstl/cpu_algos/any_of.h>
-#include <__pstl/cpu_algos/fill.h>
-#include <__pstl/cpu_algos/find_if.h>
-#include <__pstl/cpu_algos/for_each.h>
-#include <__pstl/cpu_algos/merge.h>
-#include <__pstl/cpu_algos/stable_sort.h>
-#include <__pstl/cpu_algos/transform.h>
-#include <__pstl/cpu_algos/transform_reduce.h>
-
 #endif // _LIBCPP___PSTL_BACKENDS_LIBDISPATCH_H
diff --git a/libcxx/include/__pstl/backends/serial.h b/libcxx/include/__pstl/backends/serial.h
index 6e343313bea36..8f9fd76555b49 100644
--- a/libcxx/include/__pstl/backends/serial.h
+++ b/libcxx/include/__pstl/backends/serial.h
@@ -10,12 +10,17 @@
 #ifndef _LIBCPP___PSTL_BACKENDS_SERIAL_H
 #define _LIBCPP___PSTL_BACKENDS_SERIAL_H
 
+#include <__algorithm/find_if.h>
+#include <__algorithm/for_each.h>
+#include <__algorithm/merge.h>
+#include <__algorithm/stable_sort.h>
+#include <__algorithm/transform.h>
 #include <__config>
-#include <__pstl/configuration_fwd.h>
-#include <__pstl/cpu_algos/cpu_traits.h>
+#include <__numeric/transform_reduce.h>
+#include <__pstl/backend_fwd.h>
 #include <__utility/empty.h>
+#include <__utility/forward.h>
 #include <__utility/move.h>
-#include <cstddef>
 #include <optional>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -30,48 +35,143 @@ _LIBCPP_PUSH_MACROS
 _LIBCPP_BEGIN_NAMESPACE_STD
 namespace __pstl {
 
-template <>
-struct __cpu_traits<__serial_backend_tag> {
-  template <class _RandomAccessIterator, class _Fp>
-  _LIBCPP_HIDE_FROM_ABI static optional<__empty>
-  __for_each(_RandomAccessIterator __first, _RandomAccessIterator __last, _Fp __f) {
-    __f(__first, __last);
+//
+// This partial PSTL backend runs everything serially.
+//
+// TODO: Right now, the serial backend must be used with another backend
+//       like the "default backend" because it doesn't implement all the
+//       necessary PSTL operations. It would be better to dispatch all
+//       algorithms to their serial counterpart directly, since this can
+//       often be more efficient than the "default backend"'s implementation
+//       if we end up running serially anyways.
+//
+
+template <class _ExecutionPolicy>
+struct __find_if<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Pred>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardIterator>
+  operator()(_Policy&&, _ForwardIterator __first, _ForwardIterator __last, _Pred&& __pred) const noexcept {
+    return std::find_if(std::move(__first), std::move(__last), std::forward<_Pred>(__pred));
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __for_each<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Function>
+  _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&&, _ForwardIterator __first, _ForwardIterator __last, _Function&& __func) const noexcept {
+    std::for_each(std::move(__first), std::move(__last), std::forward<_Function>(__func));
     return __empty{};
   }
+};
 
-  template <class _Index, class _UnaryOp, class _Tp, class _BinaryOp, class _Reduce>
-  _LIBCPP_HIDE_FROM_ABI static optional<_Tp>
-  __transform_reduce(_Index __first, _Index __last, _UnaryOp, _Tp __init, _BinaryOp, _Reduce __reduce) {
-    return __reduce(std::move(__first), std::move(__last), std::move(__init));
+template <class _ExecutionPolicy>
+struct __merge<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator1, class _ForwardIterator2, class _ForwardOutIterator, class _Comp>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> operator()(
+      _Policy&&,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _ForwardIterator2 __last2,
+      _ForwardOutIterator __out,
+      _Comp&& __comp) const noexcept {
+    return std::merge(
+        std::move(__first1),
+        std::move(__last1),
+        std::move(__first2),
+        std::move(__last2),
+        std::move(__out),
+        std::forward<_Comp>(__comp));
   }
+};
 
-  template <class _RandomAccessIterator, class _Compare, class _LeafSort>
-  _LIBCPP_HIDE_FROM_ABI static optional<__empty>
-  __stable_sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp, _LeafSort __leaf_sort) {
-    __leaf_sort(__first, __last, __comp);
-    return __empty{};
+template <class _ExecutionPolicy>
+struct __stable_sort<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _RandomAccessIterator, class _Comp>
+  _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&&, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp&& __comp) const noexcept {
+    std::stable_sort(std::move(__first), std::move(__last), std::forward<_Comp>(__comp));
   }
+};
 
-  _LIBCPP_HIDE_FROM_ABI static void __cancel_execution() {}
-
-  template <class _RandomAccessIterator1,
-            class _RandomAccessIterator2,
-            class _RandomAccessIterator3,
-            class _Compare,
-            class _LeafMerge>
-  _LIBCPP_HIDE_FROM_ABI static optional<__empty>
-  __merge(_RandomAccessIterator1 __first1,
-          _RandomAccessIterator1 __last1,
-          _RandomAccessIterator2 __first2,
-          _RandomAccessIterator2 __last2,
-          _RandomAccessIterator3 __outit,
-          _Compare __comp,
-          _LeafMerge __leaf_merge) {
-    __leaf_merge(__first1, __last1, __first2, __last2, __outit, __comp);
-    return __empty{};
+template <class _ExecutionPolicy>
+struct __transform<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator, class _UnaryOperation>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> operator()(
+      _Policy&&, _ForwardIterator __first, _ForwardIterator __last, _ForwardOutIterator __out, _UnaryOperation&& __op)
+      const noexcept {
+    return std::transform(std::move(__first), std::move(__last), std::move(__out), std::forward<_UnaryOperation>(__op));
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __transform_binary<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy,
+            class _ForwardIterator1,
+            class _ForwardIterator2,
+            class _ForwardOutIterator,
+            class _BinaryOperation>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&&,
+             _ForwardIterator1 __first1,
+             _ForwardIterator1 __last1,
+             _ForwardIterator2 __first2,
+             _ForwardOutIterator __out,
+             _BinaryOperation&& __op) const noexcept {
+    return std::transform(
+        std::move(__first1),
+        std::move(__last1),
+        std::move(__first2),
+        std::move(__out),
+        std::forward<_BinaryOperation>(__op));
+  }
+};
+
+template <class _ExecutionPolicy>
+struct __transform_reduce<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy, class _ForwardIterator, class _Tp, class _BinaryOperation, class _UnaryOperation>
+  _LIBCPP_HIDE_FROM_ABI optional<_Tp>
+  operator()(_Policy&&,
+             _ForwardIterator __first,
+             _ForwardIterator __last,
+             _Tp const& __init,
+             _BinaryOperation&& __reduce,
+             _UnaryOperation&& __transform) const noexcept {
+    return std::transform_reduce(
+        std::move(__first),
+        std::move(__last),
+        __init,
+        std::forward<_BinaryOperation>(__reduce),
+        std::forward<_UnaryOperation>(__transform));
   }
+};
 
-  static constexpr size_t __lane_size = 64;
+template <class _ExecutionPolicy>
+struct __transform_reduce_binary<__serial_backend_tag, _ExecutionPolicy> {
+  template <class _Policy,
+            class _ForwardIterator1,
+            class _ForwardIterator2,
+            class _Tp,
+            class _BinaryOperation1,
+            class _BinaryOperation2>
+  _LIBCPP_HIDE_FROM_ABI optional<_Tp> operator()(
+      _Policy&&,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _Tp const& __init,
+      _BinaryOperation1&& __reduce,
+      _BinaryOperation2&& __transform) const noexcept {
+    return std::transform_reduce(
+        std::move(__first1),
+        std::move(__last1),
+        std::move(__first2),
+        std::move(__last2),
+        __init,
+        std::forward<_BinaryOperation1>(__reduce),
+        std::forward<_BinaryOperation2>(__transform));
+  }
 };
 
 } // namespace __pstl
@@ -81,14 +181,4 @@ _LIBCPP_POP_MACROS
 
 #endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && && _LIBCPP_STD_VER >= 17
 
-// Implement PSTL algorithms based on the __cpu_traits specialized above
-#include <__pstl/cpu_algos/any_of.h>
-#include <__pstl/cpu_algos/fill.h>
-#include <__pstl/cpu_algos/find_if.h>
-#include <__pstl/cpu_algos/for_each.h>
-#include <__pstl/cpu_algos/merge.h>
-#include <__pstl/cpu_algos/stable_sort.h>
-#include <__pstl/cpu_algos/transform.h>
-#include <__pstl/cpu_algos/transform_reduce.h>
-
 #endif // _LIBCPP___PSTL_BACKENDS_SERIAL_H
diff --git a/libcxx/include/__pstl/backends/std_thread.h b/libcxx/include/__pstl/backends/std_thread.h
index e58f4859e6c9e..bb3db713334da 100644
--- a/libcxx/include/__pstl/backends/std_thread.h
+++ b/libcxx/include/__pstl/backends/std_thread.h
@@ -9,10 +9,18 @@
 #ifndef _LIBCPP___PSTL_BACKENDS_STD_THREAD_H
 #define _LIBCPP___PSTL_BACKENDS_STD_THREAD_H
 
-#include <__assert>
 #include <__config>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/configuration_fwd.h>
+#include <__pstl/cpu_algos/any_of.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
+#include <__pstl/cpu_algos/fill.h>
+#include <__pstl/cpu_algos/find_if.h>
+#include <__pstl/cpu_algos/for_each.h>
+#include <__pstl/cpu_algos/merge.h>
+#include <__pstl/cpu_algos/stable_sort.h>
+#include <__pstl/cpu_algos/transform.h>
+#include <__pstl/cpu_algos/transform_reduce.h>
 #include <__utility/empty.h>
 #include <__utility/move.h>
 #include <cstddef>
@@ -27,12 +35,16 @@ _LIBCPP_PUSH_MACROS
 
 #if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
 
-// This backend implementation is for testing purposes only and not meant for production use. This will be replaced
-// by a proper implementation once the PSTL implementation is somewhat stable.
-
 _LIBCPP_BEGIN_NAMESPACE_STD
 namespace __pstl {
 
+//
+// This partial backend implementation is for testing purposes only and not meant for production use. This will be
+// replaced by a proper implementation once the PSTL implementation is somewhat stable.
+//
+// This is intended to be used on top of the "default backend".
+//
+
 template <>
 struct __cpu_traits<__std_thread_backend_tag> {
   template <class _RandomAccessIterator, class _Fp>
@@ -77,6 +89,48 @@ struct __cpu_traits<__std_thread_backend_tag> {
   static constexpr size_t __lane_size = 64;
 };
 
+// Mandatory implementations of the computational basis
+template <class _ExecutionPolicy>
+struct __find_if<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_find_if<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __for_each<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_for_each<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __merge<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_merge<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __stable_sort<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_stable_sort<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform_binary<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform_binary<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform_reduce<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform_reduce<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __transform_reduce_binary<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_transform_reduce_binary<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+// Not mandatory, but better optimized
+template <class _ExecutionPolicy>
+struct __any_of<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_any_of<__std_thread_backend_tag, _ExecutionPolicy> {};
+
+template <class _ExecutionPolicy>
+struct __fill<__std_thread_backend_tag, _ExecutionPolicy>
+    : __cpu_parallel_fill<__std_thread_backend_tag, _ExecutionPolicy> {};
+
 } // namespace __pstl
 _LIBCPP_END_NAMESPACE_STD
 
@@ -84,14 +138,4 @@ _LIBCPP_END_NAMESPACE_STD
 
 _LIBCPP_POP_MACROS
 
-// Implement PSTL algorithms based on the __cpu_traits specialized above
-#include <__pstl/cpu_algos/any_of.h>
-#include <__pstl/cpu_algos/fill.h>
-#include <__pstl/cpu_algos/find_if.h>
-#include <__pstl/cpu_algos/for_each.h>
-#include <__pstl/cpu_algos/merge.h>
-#include <__pstl/cpu_algos/stable_sort.h>
-#include <__pstl/cpu_algos/transform.h>
-#include <__pstl/cpu_algos/transform_reduce.h>
-
 #endif // _LIBCPP___PSTL_BACKENDS_STD_THREAD_H
diff --git a/libcxx/include/__pstl/configuration.h b/libcxx/include/__pstl/configuration.h
index d32bd21df1f9e..507491af8d0df 100644
--- a/libcxx/include/__pstl/configuration.h
+++ b/libcxx/include/__pstl/configuration.h
@@ -16,12 +16,20 @@
 #  pragma GCC system_header
 #endif
 
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
 #if defined(_LIBCPP_PSTL_BACKEND_SERIAL)
+#  include <__pstl/backends/default.h>
 #  include <__pstl/backends/serial.h>
 #elif defined(_LIBCPP_PSTL_BACKEND_STD_THREAD)
+#  include <__pstl/backends/default.h>
 #  include <__pstl/backends/std_thread.h>
 #elif defined(_LIBCPP_PSTL_BACKEND_LIBDISPATCH)
+#  include <__pstl/backends/default.h>
 #  include <__pstl/backends/libdispatch.h>
 #endif
 
+_LIBCPP_POP_MACROS
+
 #endif // _LIBCPP___PSTL_CONFIGURATION_H
diff --git a/libcxx/include/__pstl/configuration_fwd.h b/libcxx/include/__pstl/configuration_fwd.h
index 995fcfce847cb..3fb8de05eec20 100644
--- a/libcxx/include/__pstl/configuration_fwd.h
+++ b/libcxx/include/__pstl/configuration_fwd.h
@@ -10,236 +10,41 @@
 #define _LIBCPP___PSTL_CONFIGURATION_FWD_H
 
 #include <__config>
-#include <execution>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
 #endif
 
-#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
 
 _LIBCPP_BEGIN_NAMESPACE_STD
-
-/*
-TODO: Documentation of how backends work
-
-A PSTL parallel backend is a tag type to which the following functions are associated, at minimum:
-
-  template <class _ExecutionPolicy, class _Iterator, class _Func>
-  optional<__empty> __pstl_for_each(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Func __f);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<_Iterator> __pstl_find_if(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _RandomAccessIterator, class _Comp>
-  optional<__empty>
-  __pstl_stable_sort(_Backend, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp);
-
-  template <class _ExecutionPolicy,
-            class _ForwardIterator1,
-            class _ForwardIterator2,
-            class _ForwardOutIterator,
-            class _Comp>
-  optional<_ForwardOutIterator> __pstl_merge(_Backend,
-                                             _ForwardIterator1 __first1,
-                                             _ForwardIterator1 __last1,
-                                             _ForwardIterator2 __first2,
-                                             _ForwardIterator2 __last2,
-                                             _ForwardOutIterator __result,
-                                             _Comp __comp);
-
-  template <class _ExecutionPolicy, class _InIterator, class _OutIterator, class _UnaryOperation>
-  optional<_OutIterator>
-  __pstl_transform(_Backend, _InIterator __first, _InIterator __last, _OutIterator __result, _UnaryOperation __op);
-
-  template <class _ExecutionPolicy, class _InIterator1, class _InIterator2, class _OutIterator, class _BinaryOperation>
-  optional<_OutIterator> __pstl_transform(_InIterator1 __first1,
-                                          _InIterator2 __first2,
-                                          _InIterator1 __last1,
-                                          _OutIterator __result,
-                                          _BinaryOperation __op);
-
-  template <class _ExecutionPolicy,
-            class _Iterator1,
-            class _Iterator2,
-            class _Tp,
-            class _BinaryOperation1,
-            class _BinaryOperation2>
-  optional<_Tp> __pstl_transform_reduce(_Backend,
-                                        _Iterator1 __first1,
-                                        _Iterator1 __last1,
-                                        _Iterator2 __first2,
-                                        _Iterator2 __last2,
-                                        _Tp __init,
-                                        _BinaryOperation1 __reduce,
-                                        _BinaryOperation2 __transform);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinaryOperation, class _UnaryOperation>
-  optional<_Tp> __pstl_transform_reduce(_Backend,
-                                        _Iterator __first,
-                                        _Iterator __last,
-                                        _Tp __init,
-                                        _BinaryOperation __reduce,
-                                        _UnaryOperation __transform);
-
-// TODO: Complete this list
-
-The following functions are optional but can be provided. If provided, they are used by the corresponding
-algorithms, otherwise they are implemented in terms of other algorithms. If none of the optional algorithms are
-implemented, all the algorithms will eventually forward to the basis algorithms listed above:
-
-  template <class _ExecutionPolicy, class _Iterator, class _Size, class _Func>
-  optional<__empty> __pstl_for_each_n(_Backend, _Iterator __first, _Size __n, _Func __f);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<bool> __pstl_any_of(_Backend, _Iterator __first, _iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<bool> __pstl_all_of(_Backend, _Iterator __first, _iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<bool> __pstl_none_of(_Backend, _Iterator __first, _iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Tp>
-  optional<_Iterator> __pstl_find(_Backend, _Iterator __first, _Iterator __last, const _Tp& __value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<_Iterator> __pstl_find_if_not(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Tp>
-  optional<__empty> __pstl_fill(_Backend, _Iterator __first, _Iterator __last, const _Tp& __value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _SizeT, class _Tp>
-  optional<__empty> __pstl_fill_n(_Backend, _Iterator __first, _SizeT __n, const _Tp& __value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Generator>
-  optional<__empty> __pstl_generate(_Backend, _Iterator __first, _Iterator __last, _Generator __gen);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<__empty> __pstl_is_partitioned(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Size, class _Generator>
-  optional<__empty> __pstl_generator_n(_Backend, _Iterator __first, _Size __n, _Generator __gen);
-
-  template <class _ExecutionPolicy, class _terator1, class _Iterator2, class _OutIterator, class _Comp>
-  optional<_OutIterator> __pstl_merge(_Backend,
-                                      _Iterator1 __first1,
-                                      _Iterator1 __last1,
-                                      _Iterator2 __first2,
-                                      _Iterator2 __last2,
-                                      _OutIterator __result,
-                                      _Comp __comp);
-
-  template <class _ExecutionPolicy, class _Iterator, class _OutIterator>
-  optional<_OutIterator> __pstl_move(_Backend, _Iterator __first, _Iterator __last, _OutIterator __result);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinaryOperation>
-  optional<_Tp> __pstl_reduce(_Backend, _Iterator __first, _Iterator __last, _Tp __init, _BinaryOperation __op);
-
-  temlate <class _ExecutionPolicy, class _Iterator>
-  optional<__iter_value_type<_Iterator>> __pstl_reduce(_Backend, _Iterator __first, _Iterator __last);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Tp>
-  optional<__iter_diff_t<_Iterator>> __pstl_count(_Backend, _Iterator __first, _Iterator __last, const _Tp& __value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Predicate>
-  optional<__iter_diff_t<_Iterator>> __pstl_count_if(_Backend, _Iterator __first, _Iterator __last, _Predicate __pred);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Tp>
-  optional<__empty>
-  __pstl_replace(_Backend, _Iterator __first, _Iterator __last, const _Tp& __old_value, const _Tp& __new_value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Pred, class _Tp>
-  optional<__empty>
-  __pstl_replace_if(_Backend, _Iterator __first, _Iterator __last, _Pred __pred, const _Tp& __new_value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _OutIterator, class _Tp>
-  optional<__empty> __pstl_replace_copy(_Backend,
-                                        _Iterator __first,
-                                        _Iterator __last,
-                                        _OutIterator __result,
-                                        const _Tp& __old_value,
-                                        const _Tp& __new_value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _OutIterator, class _Pred, class _Tp>
-  optional<__empty> __pstl_replace_copy_if(_Backend,
-                                           _Iterator __first,
-                                           _Iterator __last,
-                                           _OutIterator __result,
-                                           _Pred __pred,
-                                           const _Tp& __new_value);
-
-  template <class _ExecutionPolicy, class _Iterator, class _OutIterator>
-  optional<_Iterator> __pstl_rotate_copy(
-      _Backend, _Iterator __first, _Iterator __middle, _Iterator __last, _OutIterator __result);
-
-  template <class _ExecutionPolicy, class _Iterator, class _Comp>
-  optional<__empty> __pstl_sort(_Backend, _Iterator __first, _Iterator __last, _Comp __comp);
-
-  template <class _ExecutionPolicy, class _Iterator1, class _Iterator2, class _Comp>
-  optional<bool> __pstl_equal(_Backend, _Iterator1 first1, _Iterator1 last1, _Iterator2 first2, _Comp __comp);
-
-// TODO: Complete this list
-
-Exception handling
-==================
-
-PSTL backends are expected to report errors (i.e. failure to allocate) by returning a disengaged `optional` from their
-implementation. Exceptions shouldn't be used to report an internal failure-to-allocate, since all exceptions are turned
-into a program termination at the front-end level. When a backend returns a disengaged `optional` to the frontend, the
-frontend will turn that into a call to `std::__throw_bad_alloc();` to report the internal failure to the user.
-*/
-
 namespace __pstl {
-struct __libdispatch_backend_tag {};
-struct __serial_backend_tag {};
-struct __std_thread_backend_tag {};
-} // namespace __pstl
-
-#  if defined(_LIBCPP_PSTL_BACKEND_SERIAL)
-using __cpu_backend_tag = __pstl::__serial_backend_tag;
-#  elif defined(_LIBCPP_PSTL_BACKEND_STD_THREAD)
-using __cpu_backend_tag = __pstl::__std_thread_backend_tag;
-#  elif defined(_LIBCPP_PSTL_BACKEND_LIBDISPATCH)
-using __cpu_backend_tag = __pstl::__libdispatch_backend_tag;
-#  endif
 
-template <class _ExecutionPolicy>
-struct __select_backend;
+template <class... _Backends>
+struct __backend_configuration;
 
-template <>
-struct __select_backend<std::execution::sequenced_policy> {
-  using type = __cpu_backend_tag;
-};
+struct __default_backend_tag;
+struct __libdispatch_backend_tag;
+struct __serial_backend_tag;
+struct __std_thread_backend_tag;
 
-#  if _LIBCPP_STD_VER >= 20
-template <>
-struct __select_backend<std::execution::unsequenced_policy> {
-  using type = __cpu_backend_tag;
-};
-#  endif
-
-#  if defined(_LIBCPP_PSTL_BACKEND_SERIAL) || defined(_LIBCPP_PSTL_BACKEND_STD_THREAD) ||                              \
-      defined(_LIBCPP_PSTL_BACKEND_LIBDISPATCH)
-template <>
-struct __select_backend<std::execution::parallel_policy> {
-  using type = __cpu_backend_tag;
-};
-
-template <>
-struct __select_backend<std::execution::parallel_unsequenced_policy> {
-  using type = __cpu_backend_tag;
-};
-
-#  else
+#if defined(_LIBCPP_PSTL_BACKEND_SERIAL)
+using __current_configuration = __backend_configuration<__serial_backend_tag, __default_backend_tag>;
+#elif defined(_LIBCPP_PSTL_BACKEND_STD_THREAD)
+using __current_configuration = __backend_configuration<__std_thread_backend_tag, __default_backend_tag>;
+#elif defined(_LIBCPP_PSTL_BACKEND_LIBDISPATCH)
+using __current_configuration = __backend_configuration<__libdispatch_backend_tag, __default_backend_tag>;
+#else
 
 // ...New vendors can add parallel backends here...
 
-#    error "Invalid choice of a PSTL parallel backend"
-#  endif
+#  error "Invalid PSTL backend configuration"
+#endif
 
+} // namespace __pstl
 _LIBCPP_END_NAMESPACE_STD
 
-#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+_LIBCPP_POP_MACROS
 
 #endif // _LIBCPP___PSTL_CONFIGURATION_FWD_H
diff --git a/libcxx/include/__pstl/cpu_algos/any_of.h b/libcxx/include/__pstl/cpu_algos/any_of.h
index 01b9d214310a3..3173eade7585b 100644
--- a/libcxx/include/__pstl/cpu_algos/any_of.h
+++ b/libcxx/include/__pstl/cpu_algos/any_of.h
@@ -10,13 +10,12 @@
 #define _LIBCPP___PSTL_CPU_ALGOS_ANY_OF_H
 
 #include <__algorithm/any_of.h>
-#include <__algorithm/find_if.h>
+#include <__assert>
 #include <__atomic/atomic.h>
 #include <__atomic/memory_order.h>
 #include <__config>
-#include <__functional/operations.h>
 #include <__iterator/concepts.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/move.h>
@@ -70,25 +69,28 @@ _LIBCPP_HIDE_FROM_ABI bool __simd_or(_Index __first, _DifferenceType __n, _Pred
   return false;
 }
 
-template <class _ExecutionPolicy, class _ForwardIterator, class _Predicate>
-_LIBCPP_HIDE_FROM_ABI optional<bool>
-__pstl_any_of(__cpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return std::__parallel_or<__cpu_backend_tag>(
-        __first, __last, [&__pred](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
-          auto __res = std::__pstl_any_of<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{}, __brick_first, __brick_last, __pred);
-          _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
-          return *std::move(__res);
-        });
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return std::__simd_or(__first, __last - __first, __pred);
-  } else {
-    return std::any_of(__first, __last, __pred);
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_any_of {
+  template <class _Policy, class _ForwardIterator, class _Predicate>
+  _LIBCPP_HIDE_FROM_ABI optional<bool>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return std::__parallel_or<_Backend>(
+          __first, __last, [&__policy, &__pred](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            using _AnyOfUnseq = __pstl::__any_of<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            auto __res = _AnyOfUnseq()(std::__remove_parallel_policy(__policy), __brick_first, __brick_last, __pred);
+            _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
+            return *std::move(__res);
+          });
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return std::__simd_or(__first, __last - __first, __pred);
+    } else {
+      return std::any_of(__first, __last, __pred);
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/fill.h b/libcxx/include/__pstl/cpu_algos/fill.h
index 66fb751eb7a2e..b99a9d3c660d8 100644
--- a/libcxx/include/__pstl/cpu_algos/fill.h
+++ b/libcxx/include/__pstl/cpu_algos/fill.h
@@ -10,9 +10,10 @@
 #define _LIBCPP___PSTL_CPU_ALGOS_FILL_H
 
 #include <__algorithm/fill.h>
+#include <__assert>
 #include <__config>
 #include <__iterator/concepts.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/empty.h>
@@ -35,26 +36,30 @@ _LIBCPP_HIDE_FROM_ABI _Index __simd_fill_n(_Index __first, _DifferenceType __n,
   return __first + __n;
 }
 
-template <class _ExecutionPolicy, class _ForwardIterator, class _Tp>
-_LIBCPP_HIDE_FROM_ABI optional<__empty>
-__pstl_fill(__cpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return __pstl::__cpu_traits<__cpu_backend_tag>::__for_each(
-        __first, __last, [&__value](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
-          [[maybe_unused]] auto __res = std::__pstl_fill<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{}, __brick_first, __brick_last, __value);
-          _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
-        });
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    std::__simd_fill_n(__first, __last - __first, __value);
-    return __empty{};
-  } else {
-    std::fill(__first, __last, __value);
-    return __empty{};
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_fill {
+  template <class _Policy, class _ForwardIterator, class _Tp>
+  _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return __pstl::__cpu_traits<_Backend>::__for_each(
+          __first, __last, [&__policy, &__value](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            using _FillUnseq = __pstl::__fill<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            [[maybe_unused]] auto __res =
+                _FillUnseq()(std::__remove_parallel_policy(__policy), __brick_first, __brick_last, __value);
+            _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
+          });
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      std::__simd_fill_n(__first, __last - __first, __value);
+      return __empty{};
+    } else {
+      std::fill(__first, __last, __value);
+      return __empty{};
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/find_if.h b/libcxx/include/__pstl/cpu_algos/find_if.h
index c99ec01bff48f..3ddbee44890f6 100644
--- a/libcxx/include/__pstl/cpu_algos/find_if.h
+++ b/libcxx/include/__pstl/cpu_algos/find_if.h
@@ -10,12 +10,13 @@
 #define _LIBCPP___PSTL_CPU_ALGOS_FIND_IF_H
 
 #include <__algorithm/find_if.h>
+#include <__assert>
 #include <__atomic/atomic.h>
 #include <__config>
 #include <__functional/operations.h>
 #include <__iterator/concepts.h>
 #include <__iterator/iterator_traits.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/move.h>
@@ -98,33 +99,36 @@ __simd_first(_Index __first, _DifferenceType __begin, _DifferenceType __end, _Co
   return __first + __end;
 }
 
-template <class _ExecutionPolicy, class _ForwardIterator, class _Predicate>
-_LIBCPP_HIDE_FROM_ABI optional<_ForwardIterator>
-__pstl_find_if(__cpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return std::__parallel_find<__cpu_backend_tag>(
-        __first,
-        __last,
-        [&__pred](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
-          auto __res = std::__pstl_find_if<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{}, __brick_first, __brick_last, __pred);
-          _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
-          return *std::move(__res);
-        },
-        less<>{},
-        true);
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    using __diff_t = __iter_diff_t<_ForwardIterator>;
-    return std::__simd_first<__cpu_backend_tag>(
-        __first, __diff_t(0), __last - __first, [&__pred](_ForwardIterator __iter, __diff_t __i) {
-          return __pred(__iter[__i]);
-        });
-  } else {
-    return std::find_if(__first, __last, __pred);
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_find_if {
+  template <class _Policy, class _ForwardIterator, class _Predicate>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardIterator>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return std::__parallel_find<_Backend>(
+          __first,
+          __last,
+          [&__policy, &__pred](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            using _FindIfUnseq = __pstl::__find_if<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            auto __res = _FindIfUnseq()(std::__remove_parallel_policy(__policy), __brick_first, __brick_last, __pred);
+            _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
+            return *std::move(__res);
+          },
+          less<>{},
+          true);
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      using __diff_t = __iter_diff_t<_ForwardIterator>;
+      return std::__simd_first<_Backend>(
+          __first, __diff_t(0), __last - __first, [&__pred](_ForwardIterator __iter, __diff_t __i) {
+            return __pred(__iter[__i]);
+          });
+    } else {
+      return std::find_if(__first, __last, __pred);
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/for_each.h b/libcxx/include/__pstl/cpu_algos/for_each.h
index cd7ce022469bd..db71b1c35ccda 100644
--- a/libcxx/include/__pstl/cpu_algos/for_each.h
+++ b/libcxx/include/__pstl/cpu_algos/for_each.h
@@ -10,9 +10,10 @@
 #define _LIBCPP___PSTL_CPU_ALGOS_FOR_EACH_H
 
 #include <__algorithm/for_each.h>
+#include <__assert>
 #include <__config>
 #include <__iterator/concepts.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/empty.h>
@@ -35,26 +36,30 @@ _LIBCPP_HIDE_FROM_ABI _Iterator __simd_walk(_Iterator __first, _DifferenceType _
   return __first + __n;
 }
 
-template <class _ExecutionPolicy, class _ForwardIterator, class _Functor>
-_LIBCPP_HIDE_FROM_ABI optional<__empty>
-__pstl_for_each(__cpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, _Functor __func) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return __pstl::__cpu_traits<__cpu_backend_tag>::__for_each(
-        __first, __last, [__func](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
-          [[maybe_unused]] auto __res = std::__pstl_for_each<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{}, __brick_first, __brick_last, __func);
-          _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
-        });
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    std::__simd_walk(__first, __last - __first, __func);
-    return __empty{};
-  } else {
-    std::for_each(__first, __last, __func);
-    return __empty{};
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_for_each {
+  template <class _Policy, class _ForwardIterator, class _Functor>
+  _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Functor __func) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return __pstl::__cpu_traits<_Backend>::__for_each(
+          __first, __last, [&__policy, __func](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            using _ForEachUnseq = __pstl::__for_each<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            [[maybe_unused]] auto __res =
+                _ForEachUnseq()(std::__remove_parallel_policy(__policy), __brick_first, __brick_last, __func);
+            _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
+          });
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      std::__simd_walk(__first, __last - __first, __func);
+      return __empty{};
+    } else {
+      std::for_each(__first, __last, __func);
+      return __empty{};
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/merge.h b/libcxx/include/__pstl/cpu_algos/merge.h
index b857fc1fb7a56..4f4192cccb3e8 100644
--- a/libcxx/include/__pstl/cpu_algos/merge.h
+++ b/libcxx/include/__pstl/cpu_algos/merge.h
@@ -10,9 +10,10 @@
 #define _LIBCPP___PSTL_CPU_ALGOS_MERGE_H
 
 #include <__algorithm/merge.h>
+#include <__assert>
 #include <__config>
 #include <__iterator/concepts.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/move.h>
@@ -29,53 +30,53 @@ _LIBCPP_PUSH_MACROS
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _ForwardOutIterator,
-          class _Comp>
-_LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> __pstl_merge(
-    __cpu_backend_tag,
-    _ForwardIterator1 __first1,
-    _ForwardIterator1 __last1,
-    _ForwardIterator2 __first2,
-    _ForwardIterator2 __last2,
-    _ForwardOutIterator __result,
-    _Comp __comp) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value &&
-                __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
-    auto __res = __pstl::__cpu_traits<__cpu_backend_tag>::__merge(
-        __first1,
-        __last1,
-        __first2,
-        __last2,
-        __result,
-        __comp,
-        [](_ForwardIterator1 __g_first1,
-           _ForwardIterator1 __g_last1,
-           _ForwardIterator2 __g_first2,
-           _ForwardIterator2 __g_last2,
-           _ForwardOutIterator __g_result,
-           _Comp __g_comp) {
-          [[maybe_unused]] auto __g_res = std::__pstl_merge<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{},
-              std::move(__g_first1),
-              std::move(__g_last1),
-              std::move(__g_first2),
-              std::move(__g_last2),
-              std::move(__g_result),
-              std::move(__g_comp));
-          _LIBCPP_ASSERT_INTERNAL(__g_res, "unsed/sed should never try to allocate!");
-        });
-    if (!__res)
-      return nullopt;
-    return __result + (__last1 - __first1) + (__last2 - __first2);
-  } else {
-    return std::merge(__first1, __last1, __first2, __last2, __result, __comp);
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_merge {
+  template <class _Policy, class _ForwardIterator1, class _ForwardIterator2, class _ForwardOutIterator, class _Comp>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> operator()(
+      _Policy&& __policy,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _ForwardIterator2 __last2,
+      _ForwardOutIterator __result,
+      _Comp __comp) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value &&
+                  __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
+      auto __res = __pstl::__cpu_traits<_Backend>::__merge(
+          __first1,
+          __last1,
+          __first2,
+          __last2,
+          __result,
+          __comp,
+          [&__policy](_ForwardIterator1 __g_first1,
+                      _ForwardIterator1 __g_last1,
+                      _ForwardIterator2 __g_first2,
+                      _ForwardIterator2 __g_last2,
+                      _ForwardOutIterator __g_result,
+                      _Comp __g_comp) {
+            using _MergeUnseq             = __pstl::__merge<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            [[maybe_unused]] auto __g_res = _MergeUnseq()(
+                std::__remove_parallel_policy(__policy),
+                std::move(__g_first1),
+                std::move(__g_last1),
+                std::move(__g_first2),
+                std::move(__g_last2),
+                std::move(__g_result),
+                std::move(__g_comp));
+            _LIBCPP_ASSERT_INTERNAL(__g_res, "unsed/sed should never try to allocate!");
+          });
+      if (!__res)
+        return nullopt;
+      return __result + (__last1 - __first1) + (__last2 - __first2);
+    } else {
+      return std::merge(__first1, __last1, __first2, __last2, __result, __comp);
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/stable_sort.h b/libcxx/include/__pstl/cpu_algos/stable_sort.h
index 18effb2108a2f..8ea5e8a01d2ce 100644
--- a/libcxx/include/__pstl/cpu_algos/stable_sort.h
+++ b/libcxx/include/__pstl/cpu_algos/stable_sort.h
@@ -11,7 +11,7 @@
 
 #include <__algorithm/stable_sort.h>
 #include <__config>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/empty.h>
@@ -25,19 +25,22 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class _ExecutionPolicy, class _RandomAccessIterator, class _Comp>
-_LIBCPP_HIDE_FROM_ABI optional<__empty>
-__pstl_stable_sort(__cpu_backend_tag, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy>) {
-    return __pstl::__cpu_traits<__cpu_backend_tag>::__stable_sort(
-        __first, __last, __comp, [](_RandomAccessIterator __g_first, _RandomAccessIterator __g_last, _Comp __g_comp) {
-          std::stable_sort(__g_first, __g_last, __g_comp);
-        });
-  } else {
-    std::stable_sort(__first, __last, __comp);
-    return __empty{};
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_stable_sort {
+  template <class _Policy, class _RandomAccessIterator, class _Comp>
+  _LIBCPP_HIDE_FROM_ABI optional<__empty>
+  operator()(_Policy&&, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp __comp) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy>) {
+      return __pstl::__cpu_traits<_Backend>::__stable_sort(
+          __first, __last, __comp, [](_RandomAccessIterator __g_first, _RandomAccessIterator __g_last, _Comp __g_comp) {
+            std::stable_sort(__g_first, __g_last, __g_comp);
+          });
+    } else {
+      std::stable_sort(__first, __last, __comp);
+      return __empty{};
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/transform.h b/libcxx/include/__pstl/cpu_algos/transform.h
index 70853dc9af24e..a4541fb22e8f6 100644
--- a/libcxx/include/__pstl/cpu_algos/transform.h
+++ b/libcxx/include/__pstl/cpu_algos/transform.h
@@ -10,14 +10,14 @@
 #define _LIBCPP___PSTL_CPU_ALGOS_TRANSFORM_H
 
 #include <__algorithm/transform.h>
+#include <__assert>
 #include <__config>
 #include <__iterator/concepts.h>
 #include <__iterator/iterator_traits.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
-#include <__type_traits/enable_if.h>
 #include <__type_traits/is_execution_policy.h>
-#include <__type_traits/remove_cvref.h>
+#include <__utility/move.h>
 #include <optional>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -40,38 +40,48 @@ __simd_walk(_Iterator1 __first1, _DifferenceType __n, _Iterator2 __first2, _Func
   return __first2 + __n;
 }
 
-template <class _ExecutionPolicy, class _ForwardIterator, class _ForwardOutIterator, class _UnaryOperation>
-_LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> __pstl_transform(
-    __cpu_backend_tag,
-    _ForwardIterator __first,
-    _ForwardIterator __last,
-    _ForwardOutIterator __result,
-    _UnaryOperation __op) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value &&
-                __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
-    __pstl::__cpu_traits<__cpu_backend_tag>::__for_each(
-        __first, __last, [__op, __first, __result](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
-          auto __res = std::__pstl_transform<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{}, __brick_first, __brick_last, __result + (__brick_first - __first), __op);
-          _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
-          return *std::move(__res);
-        });
-    return __result + (__last - __first);
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator>::value &&
-                       __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
-    return std::__simd_walk(
-        __first,
-        __last - __first,
-        __result,
-        [&](__iter_reference<_ForwardIterator> __in_value, __iter_reference<_ForwardOutIterator> __out_value) {
-          __out_value = __op(__in_value);
-        });
-  } else {
-    return std::transform(__first, __last, __result, __op);
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_transform {
+  template <class _Policy, class _ForwardIterator, class _ForwardOutIterator, class _UnaryOperation>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&& __policy,
+             _ForwardIterator __first,
+             _ForwardIterator __last,
+             _ForwardOutIterator __result,
+             _UnaryOperation __op) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator>::value &&
+                  __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
+      __pstl::__cpu_traits<_Backend>::__for_each(
+          __first,
+          __last,
+          [&__policy, __op, __first, __result](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            using _TransformUnseq = __pstl::__transform<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            auto __res            = _TransformUnseq()(
+                std::__remove_parallel_policy(__policy),
+                __brick_first,
+                __brick_last,
+                __result + (__brick_first - __first),
+                __op);
+            _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
+            return *std::move(__res);
+          });
+      return __result + (__last - __first);
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator>::value &&
+                         __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
+      return std::__simd_walk(
+          __first,
+          __last - __first,
+          __result,
+          [&](__iter_reference<_ForwardIterator> __in_value, __iter_reference<_ForwardOutIterator> __out_value) {
+            __out_value = __op(__in_value);
+          });
+    } else {
+      return std::transform(__first, __last, __result, __op);
+    }
   }
-}
+};
 
 template <class _Iterator1, class _DifferenceType, class _Iterator2, class _Iterator3, class _Function>
 _LIBCPP_HIDE_FROM_ABI _Iterator3 __simd_walk(
@@ -81,54 +91,60 @@ _LIBCPP_HIDE_FROM_ABI _Iterator3 __simd_walk(
     __f(__first1[__i], __first2[__i], __first3[__i]);
   return __first3 + __n;
 }
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _ForwardOutIterator,
-          class _BinaryOperation,
-          enable_if_t<is_execution_policy_v<__remove_cvref_t<_ExecutionPolicy>>, int> = 0>
-_LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> __pstl_transform(
-    __cpu_backend_tag,
-    _ForwardIterator1 __first1,
-    _ForwardIterator1 __last1,
-    _ForwardIterator2 __first2,
-    _ForwardOutIterator __result,
-    _BinaryOperation __op) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value &&
-                __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
-    auto __res = __pstl::__cpu_traits<__cpu_backend_tag>::__for_each(
-        __first1,
-        __last1,
-        [__op, __first1, __first2, __result](_ForwardIterator1 __brick_first, _ForwardIterator1 __brick_last) {
-          return std::__pstl_transform<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{},
-              __brick_first,
-              __brick_last,
-              __first2 + (__brick_first - __first1),
-              __result + (__brick_first - __first1),
-              __op);
-        });
-    if (!__res)
-      return nullopt;
-    return __result + (__last1 - __first1);
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value &&
-                       __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
-    return std::__simd_walk(
-        __first1,
-        __last1 - __first1,
-        __first2,
-        __result,
-        [&](__iter_reference<_ForwardIterator1> __in1,
-            __iter_reference<_ForwardIterator2> __in2,
-            __iter_reference<_ForwardOutIterator> __out_value) { __out_value = __op(__in1, __in2); });
-  } else {
-    return std::transform(__first1, __last1, __first2, __result, __op);
+
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_transform_binary {
+  template <class _Policy,
+            class _ForwardIterator1,
+            class _ForwardIterator2,
+            class _ForwardOutIterator,
+            class _BinaryOperation>
+  _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator>
+  operator()(_Policy&& __policy,
+             _ForwardIterator1 __first1,
+             _ForwardIterator1 __last1,
+             _ForwardIterator2 __first2,
+             _ForwardOutIterator __result,
+             _BinaryOperation __op) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value &&
+                  __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
+      auto __res = __pstl::__cpu_traits<_Backend>::__for_each(
+          __first1,
+          __last1,
+          [&__policy, __op, __first1, __first2, __result](
+              _ForwardIterator1 __brick_first, _ForwardIterator1 __brick_last) {
+            using _TransformBinaryUnseq =
+                __pstl::__transform_binary<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            return _TransformBinaryUnseq()(
+                std::__remove_parallel_policy(__policy),
+                __brick_first,
+                __brick_last,
+                __first2 + (__brick_first - __first1),
+                __result + (__brick_first - __first1),
+                __op);
+          });
+      if (!__res)
+        return nullopt;
+      return __result + (__last1 - __first1);
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value &&
+                         __has_random_access_iterator_category_or_concept<_ForwardOutIterator>::value) {
+      return std::__simd_walk(
+          __first1,
+          __last1 - __first1,
+          __first2,
+          __result,
+          [&](__iter_reference<_ForwardIterator1> __in1,
+              __iter_reference<_ForwardIterator2> __in2,
+              __iter_reference<_ForwardOutIterator> __out_value) { __out_value = __op(__in1, __in2); });
+    } else {
+      return std::transform(__first1, __last1, __first2, __result, __op);
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/cpu_algos/transform_reduce.h b/libcxx/include/__pstl/cpu_algos/transform_reduce.h
index a85ee9fb773af..914c46dcd6dcf 100644
--- a/libcxx/include/__pstl/cpu_algos/transform_reduce.h
+++ b/libcxx/include/__pstl/cpu_algos/transform_reduce.h
@@ -9,16 +9,18 @@
 #ifndef _LIBCPP___PSTL_CPU_ALGOS_TRANSFORM_REDUCE_H
 #define _LIBCPP___PSTL_CPU_ALGOS_TRANSFORM_REDUCE_H
 
+#include <__assert>
 #include <__config>
 #include <__iterator/concepts.h>
 #include <__iterator/iterator_traits.h>
 #include <__numeric/transform_reduce.h>
-#include <__pstl/configuration_fwd.h>
+#include <__pstl/backend_fwd.h>
 #include <__pstl/cpu_algos/cpu_traits.h>
 #include <__type_traits/desugars_to.h>
 #include <__type_traits/is_arithmetic.h>
 #include <__type_traits/is_execution_policy.h>
 #include <__utility/move.h>
+#include <cstddef>
 #include <new>
 #include <optional>
 
@@ -103,99 +105,109 @@ __simd_transform_reduce(_Size __n, _Tp __init, _BinaryOperation __binary_op, _Un
   return __init;
 }
 
-template <class _ExecutionPolicy,
-          class _ForwardIterator1,
-          class _ForwardIterator2,
-          class _Tp,
-          class _BinaryOperation1,
-          class _BinaryOperation2>
-_LIBCPP_HIDE_FROM_ABI optional<_Tp> __pstl_transform_reduce(
-    __cpu_backend_tag,
-    _ForwardIterator1 __first1,
-    _ForwardIterator1 __last1,
-    _ForwardIterator2 __first2,
-    _Tp __init,
-    _BinaryOperation1 __reduce,
-    _BinaryOperation2 __transform) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value) {
-    return __pstl::__cpu_traits<__cpu_backend_tag>::__transform_reduce(
-        __first1,
-        std::move(__last1),
-        [__first1, __first2, __transform](_ForwardIterator1 __iter) {
-          return __transform(*__iter, *(__first2 + (__iter - __first1)));
-        },
-        std::move(__init),
-        std::move(__reduce),
-        [__first1, __first2, __reduce, __transform](
-            _ForwardIterator1 __brick_first, _ForwardIterator1 __brick_last, _Tp __brick_init) {
-          return *std::__pstl_transform_reduce<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{},
-              __brick_first,
-              std::move(__brick_last),
-              __first2 + (__brick_first - __first1),
-              std::move(__brick_init),
-              std::move(__reduce),
-              std::move(__transform));
-        });
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value) {
-    return std::__simd_transform_reduce<__cpu_backend_tag>(
-        __last1 - __first1, std::move(__init), std::move(__reduce), [&](__iter_diff_t<_ForwardIterator1> __i) {
-          return __transform(__first1[__i], __first2[__i]);
-        });
-  } else {
-    return std::transform_reduce(
-        std::move(__first1),
-        std::move(__last1),
-        std::move(__first2),
-        std::move(__init),
-        std::move(__reduce),
-        std::move(__transform));
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_transform_reduce_binary {
+  template <class _Policy,
+            class _ForwardIterator1,
+            class _ForwardIterator2,
+            class _Tp,
+            class _BinaryOperation1,
+            class _BinaryOperation2>
+  _LIBCPP_HIDE_FROM_ABI optional<_Tp> operator()(
+      _Policy&& __policy,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _Tp __init,
+      _BinaryOperation1 __reduce,
+      _BinaryOperation2 __transform) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value) {
+      return __pstl::__cpu_traits<_Backend>::__transform_reduce(
+          __first1,
+          std::move(__last1),
+          [__first1, __first2, __transform](_ForwardIterator1 __iter) {
+            return __transform(*__iter, *(__first2 + (__iter - __first1)));
+          },
+          std::move(__init),
+          std::move(__reduce),
+          [&__policy, __first1, __first2, __reduce, __transform](
+              _ForwardIterator1 __brick_first, _ForwardIterator1 __brick_last, _Tp __brick_init) {
+            using _TransformReduceBinaryUnseq =
+                __pstl::__transform_reduce_binary<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            return *_TransformReduceBinaryUnseq()(
+                std::__remove_parallel_policy(__policy),
+                __brick_first,
+                std::move(__brick_last),
+                __first2 + (__brick_first - __first1),
+                std::move(__brick_init),
+                std::move(__reduce),
+                std::move(__transform));
+          });
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator1>::value &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator2>::value) {
+      return std::__simd_transform_reduce<_Backend>(
+          __last1 - __first1, std::move(__init), std::move(__reduce), [&](__iter_diff_t<_ForwardIterator1> __i) {
+            return __transform(__first1[__i], __first2[__i]);
+          });
+    } else {
+      return std::transform_reduce(
+          std::move(__first1),
+          std::move(__last1),
+          std::move(__first2),
+          std::move(__init),
+          std::move(__reduce),
+          std::move(__transform));
+    }
   }
-}
+};
 
-template <class _ExecutionPolicy, class _ForwardIterator, class _Tp, class _BinaryOperation, class _UnaryOperation>
-_LIBCPP_HIDE_FROM_ABI optional<_Tp> __pstl_transform_reduce(
-    __cpu_backend_tag,
-    _ForwardIterator __first,
-    _ForwardIterator __last,
-    _Tp __init,
-    _BinaryOperation __reduce,
-    _UnaryOperation __transform) {
-  if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
-                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return __pstl::__cpu_traits<__cpu_backend_tag>::__transform_reduce(
-        std::move(__first),
-        std::move(__last),
-        [__transform](_ForwardIterator __iter) { return __transform(*__iter); },
-        std::move(__init),
-        __reduce,
-        [__transform, __reduce](auto __brick_first, auto __brick_last, _Tp __brick_init) {
-          auto __res = std::__pstl_transform_reduce<__remove_parallel_policy_t<_ExecutionPolicy>>(
-              __cpu_backend_tag{},
-              std::move(__brick_first),
-              std::move(__brick_last),
-              std::move(__brick_init),
-              std::move(__reduce),
-              std::move(__transform));
-          _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
-          return *std::move(__res);
-        });
-  } else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
-                       __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
-    return std::__simd_transform_reduce<__cpu_backend_tag>(
-        __last - __first,
-        std::move(__init),
-        std::move(__reduce),
-        [=, &__transform](__iter_diff_t<_ForwardIterator> __i) { return __transform(__first[__i]); });
-  } else {
-    return std::transform_reduce(
-        std::move(__first), std::move(__last), std::move(__init), std::move(__reduce), std::move(__transform));
+template <class _Backend, class _RawExecutionPolicy>
+struct __cpu_parallel_transform_reduce {
+  template <class _Policy, class _ForwardIterator, class _Tp, class _BinaryOperation, class _UnaryOperation>
+  _LIBCPP_HIDE_FROM_ABI optional<_Tp>
+  operator()(_Policy&& __policy,
+             _ForwardIterator __first,
+             _ForwardIterator __last,
+             _Tp __init,
+             _BinaryOperation __reduce,
+             _UnaryOperation __transform) const noexcept {
+    if constexpr (__is_parallel_execution_policy_v<_RawExecutionPolicy> &&
+                  __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return __pstl::__cpu_traits<_Backend>::__transform_reduce(
+          std::move(__first),
+          std::move(__last),
+          [__transform](_ForwardIterator __iter) { return __transform(*__iter); },
+          std::move(__init),
+          __reduce,
+          [&__policy, __transform, __reduce](auto __brick_first, auto __brick_last, _Tp __brick_init) {
+            using _TransformReduceUnseq =
+                __pstl::__transform_reduce<_Backend, __remove_parallel_policy_t<_RawExecutionPolicy>>;
+            auto __res = _TransformReduceUnseq()(
+                std::__remove_parallel_policy(__policy),
+                std::move(__brick_first),
+                std::move(__brick_last),
+                std::move(__brick_init),
+                std::move(__reduce),
+                std::move(__transform));
+            _LIBCPP_ASSERT_INTERNAL(__res, "unseq/seq should never try to allocate!");
+            return *std::move(__res);
+          });
+    } else if constexpr (__is_unsequenced_execution_policy_v<_RawExecutionPolicy> &&
+                         __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+      return std::__simd_transform_reduce<_Backend>(
+          __last - __first,
+          std::move(__init),
+          std::move(__reduce),
+          [=, &__transform](__iter_diff_t<_ForwardIterator> __i) { return __transform(__first[__i]); });
+    } else {
+      return std::transform_reduce(
+          std::move(__first), std::move(__last), std::move(__init), std::move(__reduce), std::move(__transform));
+    }
   }
-}
+};
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/__pstl/dispatch.h b/libcxx/include/__pstl/dispatch.h
new file mode 100644
index 0000000000000..8b1a7cb2c5f89
--- /dev/null
+++ b/libcxx/include/__pstl/dispatch.h
@@ -0,0 +1,66 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___PSTL_DISPATCH_H
+#define _LIBCPP___PSTL_DISPATCH_H
+
+#include <__config>
+#include <__pstl/configuration_fwd.h>
+#include <__type_traits/conditional.h>
+#include <__type_traits/enable_if.h>
+#include <__type_traits/integral_constant.h>
+#include <__type_traits/type_identity.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+namespace __pstl {
+
+template <template <class, class> class _Algorithm, class _Backend, class _ExecutionPolicy, class = void>
+struct __is_implemented : false_type {};
+
+template <template <class, class> class _Algorithm, class _Backend, class _ExecutionPolicy>
+struct __is_implemented<_Algorithm,
+                        _Backend,
+                        _ExecutionPolicy,
+                        __enable_if_t<sizeof(_Algorithm<_Backend, _ExecutionPolicy>)>> : true_type {};
+
+// Helpful to provide better error messages. This will show the algorithm and the execution policy
+// in the compiler diagnostic.
+template <template <class, class> class _Algorithm, class _ExecutionPolicy>
+constexpr bool __cant_find_backend_for = false;
+
+template <template <class, class> class _Algorithm, class _BackendConfiguration, class _ExecutionPolicy>
+struct __find_first_implemented;
+
+template <template <class, class> class _Algorithm, class _ExecutionPolicy>
+struct __find_first_implemented<_Algorithm, __backend_configuration<>, _ExecutionPolicy> {
+  static_assert(__cant_find_backend_for<_Algorithm, _ExecutionPolicy>,
+                "Could not find a PSTL backend for the given algorithm and execution policy");
+};
+
+template <template <class, class> class _Algorithm, class _B1, class... _Bn, class _ExecutionPolicy>
+struct __find_first_implemented<_Algorithm, __backend_configuration<_B1, _Bn...>, _ExecutionPolicy>
+    : _If<__is_implemented<_Algorithm, _B1, _ExecutionPolicy>::value,
+          __type_identity<_Algorithm<_B1, _ExecutionPolicy>>,
+          __find_first_implemented<_Algorithm, __backend_configuration<_Bn...>, _ExecutionPolicy> > {};
+
+template <template <class, class> class _Algorithm, class _BackendConfiguration, class _ExecutionPolicy>
+using __dispatch = typename __find_first_implemented<_Algorithm, _BackendConfiguration, _ExecutionPolicy>::type;
+
+} // namespace __pstl
+_LIBCPP_END_NAMESPACE_STD
+
+_LIBCPP_POP_MACROS
+
+#endif // _LIBCPP___PSTL_DISPATCH_H
diff --git a/libcxx/include/__pstl/run_backend.h b/libcxx/include/__pstl/run_backend.h
new file mode 100644
index 0000000000000..201fb1328704e
--- /dev/null
+++ b/libcxx/include/__pstl/run_backend.h
@@ -0,0 +1,57 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___PSTL_RUN_BACKEND_H
+#define _LIBCPP___PSTL_RUN_BACKEND_H
+
+#include <__config>
+#include <__utility/forward.h>
+#include <__utility/move.h>
+#include <new> // __throw_bad_alloc
+#include <optional>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+namespace __pstl {
+
+template <class _BackendFunction, class... _Args>
+_LIBCPP_HIDE_FROM_ABI _LIBCPP_ALWAYS_INLINE auto __run_backend_impl(_Args&&... __args) noexcept {
+  return _BackendFunction{}(std::forward<_Args>(__args)...);
+}
+
+// This function is used to call a backend PSTL algorithm from a frontend algorithm.
+//
+// All PSTL backend algorithms return an optional denoting whether there was an
+// "infrastructure"-level failure (aka failure to allocate). This function takes
+// care of unwrapping that and throwing `bad_alloc()` in case there was a problem
+// in the underlying implementation.
+//
+// We must also be careful not to call any user code that could throw an exception
+// (such as moving or copying iterators) in here since that should terminate the
+// program, which is why we delegate to a noexcept helper below.
+template <class _BackendFunction, class... _Args>
+_LIBCPP_HIDE_FROM_ABI auto __run_backend(_Args&&... __args) {
+  auto __result = __pstl::__run_backend_impl<_BackendFunction>(std::forward<_Args>(__args)...);
+  if (__result == nullopt)
+    std::__throw_bad_alloc();
+  else
+    return std::move(*__result);
+}
+
+} // namespace __pstl
+_LIBCPP_END_NAMESPACE_STD
+
+_LIBCPP_POP_MACROS
+
+#endif // _LIBCPP___PSTL_RUN_BACKEND_H
diff --git a/libcxx/include/module.modulemap b/libcxx/include/module.modulemap
index ca0fbe2cc7ae8..51901025a3f9f 100644
--- a/libcxx/include/module.modulemap
+++ b/libcxx/include/module.modulemap
@@ -718,10 +718,6 @@ module std_private_algorithm_pstl                                        [system
   header "__algorithm/pstl.h"
   export *
 }
-module std_private_algorithm_pstl_frontend_dispatch                      [system] {
-  header "__algorithm/pstl_frontend_dispatch.h"
-  export std_private_utility_forward
-}
 module std_private_algorithm_push_heap                                   [system] { header "__algorithm/push_heap.h" }
 module std_private_algorithm_ranges_adjacent_find                        [system] { header "__algorithm/ranges_adjacent_find.h" }
 module std_private_algorithm_ranges_all_of                               [system] { header "__algorithm/ranges_all_of.h" }
@@ -1579,18 +1575,20 @@ module std_private_numeric_transform_exclusive_scan [system] { header "__numeric
 module std_private_numeric_transform_inclusive_scan [system] { header "__numeric/transform_inclusive_scan.h" }
 module std_private_numeric_transform_reduce         [system] { header "__numeric/transform_reduce.h" }
 
+module std_private_pstl_backend_fwd                [system] { header "__pstl/backend_fwd.h" }
+module std_private_pstl_backends_default           [system] { header "__pstl/backends/default.h" }
 module std_private_pstl_backends_libdispatch       [system] { header "__pstl/backends/libdispatch.h" }
 module std_private_pstl_backends_serial            [system] { header "__pstl/backends/serial.h" }
 module std_private_pstl_backends_std_thread        [system] { header "__pstl/backends/std_thread.h" }
-module std_private_pstl_cpu_algos_any_of           [system] { textual header "__pstl/cpu_algos/any_of.h" }
+module std_private_pstl_cpu_algos_any_of           [system] { header "__pstl/cpu_algos/any_of.h" }
 module std_private_pstl_cpu_algos_cpu_traits       [system] { header "__pstl/cpu_algos/cpu_traits.h" }
-module std_private_pstl_cpu_algos_fill             [system] { textual header "__pstl/cpu_algos/fill.h" }
-module std_private_pstl_cpu_algos_find_if          [system] { textual header "__pstl/cpu_algos/find_if.h" }
-module std_private_pstl_cpu_algos_for_each         [system] { textual header "__pstl/cpu_algos/for_each.h" }
-module std_private_pstl_cpu_algos_merge            [system] { textual header "__pstl/cpu_algos/merge.h" }
-module std_private_pstl_cpu_algos_stable_sort      [system] { textual header "__pstl/cpu_algos/stable_sort.h" }
-module std_private_pstl_cpu_algos_transform        [system] { textual header "__pstl/cpu_algos/transform.h" }
-module std_private_pstl_cpu_algos_transform_reduce [system] { textual header "__pstl/cpu_algos/transform_reduce.h" }
+module std_private_pstl_cpu_algos_fill             [system] { header "__pstl/cpu_algos/fill.h" }
+module std_private_pstl_cpu_algos_find_if          [system] { header "__pstl/cpu_algos/find_if.h" }
+module std_private_pstl_cpu_algos_for_each         [system] { header "__pstl/cpu_algos/for_each.h" }
+module std_private_pstl_cpu_algos_merge            [system] { header "__pstl/cpu_algos/merge.h" }
+module std_private_pstl_cpu_algos_stable_sort      [system] { header "__pstl/cpu_algos/stable_sort.h" }
+module std_private_pstl_cpu_algos_transform        [system] { header "__pstl/cpu_algos/transform.h" }
+module std_private_pstl_cpu_algos_transform_reduce [system] { header "__pstl/cpu_algos/transform_reduce.h" }
 module std_private_pstl_configuration_fwd          [system] {
   header "__pstl/configuration_fwd.h"
   export *
@@ -1599,6 +1597,8 @@ module std_private_pstl_configuration              [system] {
   header "__pstl/configuration.h"
   export *
 }
+module std_private_pstl_dispatch                   [system] { header "__pstl/dispatch.h" }
+module std_private_pstl_run_backend                [system] { header "__pstl/run_backend.h" }
 
 module std_private_queue_fwd [system] { header "__fwd/queue.h" }
 
diff --git a/libcxx/test/libcxx/algorithms/pstl.iterator-requirements.verify.cpp b/libcxx/test/libcxx/algorithms/pstl.iterator-requirements.verify.cpp
index 98e3509752e16..e5bd7e764c59b 100644
--- a/libcxx/test/libcxx/algorithms/pstl.iterator-requirements.verify.cpp
+++ b/libcxx/test/libcxx/algorithms/pstl.iterator-requirements.verify.cpp
@@ -26,6 +26,7 @@
 
 #include <algorithm>
 #include <cstddef>
+#include <execution>
 #include <numeric>
 
 #include "test_iterators.h"
diff --git a/libcxx/test/libcxx/algorithms/pstl.robust_against_customization_points_not_working.pass.cpp b/libcxx/test/libcxx/algorithms/pstl.robust_against_customization_points_not_working.pass.cpp
deleted file mode 100644
index 09258f7c9eb56..0000000000000
--- a/libcxx/test/libcxx/algorithms/pstl.robust_against_customization_points_not_working.pass.cpp
+++ /dev/null
@@ -1,405 +0,0 @@
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-// UNSUPPORTED: c++03, c++11, c++14
-// UNSUPPORTED: libcpp-has-no-incomplete-pstl
-
-// Having a customization point outside the module doesn't work, so this test is inherently module-hostile.
-// UNSUPPORTED: clang-modules-build
-
-// Make sure that the customization points get called properly when overloaded
-
-#include <__config>
-#include <__iterator/iterator_traits.h>
-#include <__iterator/readable_traits.h>
-#include <__utility/empty.h>
-#include <cassert>
-#include <optional>
-
-struct TestPolicy {};
-struct TestBackend {};
-
-_LIBCPP_BEGIN_NAMESPACE_STD
-
-bool pstl_any_of_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<bool> __pstl_any_of(TestBackend, ForwardIterator, ForwardIterator, Pred) {
-  assert(!pstl_any_of_called);
-  pstl_any_of_called = true;
-  return true;
-}
-
-bool pstl_all_of_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<bool> __pstl_all_of(TestBackend, ForwardIterator, ForwardIterator, Pred) {
-  assert(!pstl_all_of_called);
-  pstl_all_of_called = true;
-  return true;
-}
-
-bool pstl_copy_called = false;
-
-template <class, class ForwardIterator, class ForwardOutIterator>
-optional<ForwardOutIterator> __pstl_copy(TestBackend, ForwardIterator, ForwardIterator, ForwardOutIterator res) {
-  assert(!pstl_copy_called);
-  pstl_copy_called = true;
-  return res;
-}
-
-bool pstl_copy_n_called = false;
-
-template <class, class ForwardIterator, class Size, class ForwardOutIterator>
-optional<ForwardOutIterator> __pstl_copy_n(TestBackend, ForwardIterator, Size, ForwardOutIterator res) {
-  assert(!pstl_copy_n_called);
-  pstl_copy_n_called = true;
-  return res;
-}
-
-bool pstl_count_called = false;
-
-template <class, class ForwardIterator, class T>
-optional<typename std::iterator_traits<ForwardIterator>::difference_type>
-__pstl_count(TestBackend, ForwardIterator, ForwardIterator, const T&) {
-  assert(!pstl_count_called);
-  pstl_count_called = true;
-  return 0;
-}
-
-bool pstl_count_if_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<typename std::iterator_traits<ForwardIterator>::difference_type>
-__pstl_count_if(TestBackend, ForwardIterator, ForwardIterator, Pred) {
-  assert(!pstl_count_if_called);
-  pstl_count_if_called = true;
-  return 0;
-}
-
-bool pstl_generate_called = false;
-
-template <class, class ForwardIterator, class Gen>
-optional<__empty> __pstl_generate(TestBackend, ForwardIterator, ForwardIterator, Gen) {
-  assert(!pstl_generate_called);
-  pstl_generate_called = true;
-  return __empty{};
-}
-
-bool pstl_generate_n_called = false;
-
-template <class, class ForwardIterator, class Size, class Gen>
-optional<__empty> __pstl_generate_n(TestBackend, Size, ForwardIterator, Gen) {
-  assert(!pstl_generate_n_called);
-  pstl_generate_n_called = true;
-  return __empty{};
-}
-
-bool pstl_none_of_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<bool> __pstl_none_of(TestBackend, ForwardIterator, ForwardIterator, Pred) {
-  assert(!pstl_none_of_called);
-  pstl_none_of_called = true;
-  return true;
-}
-
-bool pstl_find_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<ForwardIterator> __pstl_find(TestBackend, ForwardIterator first, ForwardIterator, Pred) {
-  assert(!pstl_find_called);
-  pstl_find_called = true;
-  return first;
-}
-
-bool pstl_find_if_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<ForwardIterator> __pstl_find_if(TestBackend, ForwardIterator first, ForwardIterator, Pred) {
-  assert(!pstl_find_if_called);
-  pstl_find_if_called = true;
-  return first;
-}
-
-bool pstl_find_if_not_called = false;
-
-template <class, class ForwardIterator, class Pred>
-optional<ForwardIterator> __pstl_find_if_not(TestBackend, ForwardIterator first, ForwardIterator, Pred) {
-  assert(!pstl_find_if_not_called);
-  pstl_find_if_not_called = true;
-  return first;
-}
-
-bool pstl_for_each_called = false;
-
-template <class, class ForwardIterator, class Size, class Func>
-optional<__empty> __pstl_for_each(TestBackend, ForwardIterator, Size, Func) {
-  assert(!pstl_for_each_called);
-  pstl_for_each_called = true;
-  return __empty{};
-}
-
-bool pstl_for_each_n_called = false;
-
-template <class, class ForwardIterator, class Size, class Func>
-optional<__empty> __pstl_for_each_n(TestBackend, ForwardIterator, Size, Func) {
-  assert(!pstl_for_each_n_called);
-  pstl_for_each_n_called = true;
-  return __empty{};
-}
-
-bool pstl_fill_called = false;
-
-template <class, class ForwardIterator, class Size, class Func>
-optional<__empty> __pstl_fill(TestBackend, ForwardIterator, Size, Func) {
-  assert(!pstl_fill_called);
-  pstl_fill_called = true;
-  return __empty{};
-}
-
-bool pstl_fill_n_called = false;
-
-template <class, class ForwardIterator, class Size, class Func>
-optional<__empty> __pstl_fill_n(TestBackend, ForwardIterator, Size, Func) {
-  assert(!pstl_fill_n_called);
-  pstl_fill_n_called = true;
-  return __empty{};
-}
-
-bool pstl_move_called = false;
-
-template <class, class ForwardIterator, class Size, class Func>
-ForwardIterator __pstl_move(TestBackend, ForwardIterator, Size, Func) {
-  assert(!pstl_move_called);
-  pstl_move_called = true;
-  return 0;
-}
-
-bool pstl_is_partitioned_called = false;
-
-template <class, class ForwardIterator, class Func>
-optional<bool> __pstl_is_partitioned(TestBackend, ForwardIterator, ForwardIterator, Func) {
-  assert(!pstl_is_partitioned_called);
-  pstl_is_partitioned_called = true;
-  return true;
-}
-
-bool pstl_replace_called = false;
-
-template <class, class ForwardIterator, class T>
-optional<__empty> __pstl_replace(TestBackend, ForwardIterator, ForwardIterator, const T&, const T&) {
-  assert(!pstl_replace_called);
-  pstl_replace_called = true;
-  return __empty{};
-}
-
-bool pstl_replace_if_called = false;
-
-template <class, class ForwardIterator, class T, class Func>
-optional<__empty> __pstl_replace_if(TestBackend, ForwardIterator, ForwardIterator, Func, const T&) {
-  assert(!pstl_replace_if_called);
-  pstl_replace_if_called = true;
-  return __empty{};
-}
-
-bool pstl_replace_copy_called = false;
-
-template <class, class ForwardIterator, class ForwardOutIterator, class T>
-optional<__empty>
-__pstl_replace_copy(TestBackend, ForwardIterator, ForwardIterator, ForwardOutIterator, const T&, const T&) {
-  assert(!pstl_replace_copy_called);
-  pstl_replace_copy_called = true;
-  return __empty{};
-}
-
-bool pstl_replace_copy_if_called = false;
-
-template <class, class ForwardIterator, class ForwardOutIterator, class T, class Func>
-optional<__empty>
-__pstl_replace_copy_if(TestBackend, ForwardIterator, ForwardIterator, ForwardOutIterator, Func, const T&) {
-  assert(!pstl_replace_copy_if_called);
-  pstl_replace_copy_if_called = true;
-  return __empty{};
-}
-
-bool pstl_rotate_copy_called = false;
-
-template <class, class ForwardIterator, class ForwardOutIterator>
-optional<ForwardOutIterator>
-__pstl_rotate_copy(TestBackend, ForwardIterator, ForwardIterator, ForwardIterator, ForwardOutIterator res) {
-  assert(!pstl_rotate_copy_called);
-  pstl_rotate_copy_called = true;
-  return res;
-}
-
-bool pstl_unary_transform_called = false;
-
-template <class, class ForwardIterator, class ForwardOutIterator, class UnaryOperation>
-optional<ForwardOutIterator>
-__pstl_transform(TestBackend, ForwardIterator, ForwardIterator, ForwardOutIterator res, UnaryOperation) {
-  assert(!pstl_unary_transform_called);
-  pstl_unary_transform_called = true;
-  return res;
-}
-
-bool pstl_binary_transform_called = false;
-
-template <class, class ForwardIterator1, class ForwardIterator2, class ForwardOutIterator, class BinaryOperation>
-optional<ForwardOutIterator> __pstl_transform(
-    TestBackend, ForwardIterator1, ForwardIterator1, ForwardIterator2, ForwardOutIterator res, BinaryOperation) {
-  assert(!pstl_binary_transform_called);
-  pstl_binary_transform_called = true;
-  return res;
-}
-
-bool pstl_reduce_with_init_called = false;
-
-template <class, class ForwardIterator, class T, class BinaryOperation>
-optional<T> __pstl_reduce(TestBackend, ForwardIterator, ForwardIterator, T v, BinaryOperation) {
-  assert(!pstl_reduce_with_init_called);
-  pstl_reduce_with_init_called = true;
-  return v;
-}
-
-bool pstl_reduce_without_init_called = false;
-
-template <class, class ForwardIterator>
-optional<typename std::iterator_traits<ForwardIterator>::value_type>
-__pstl_reduce(TestBackend, ForwardIterator first, ForwardIterator) {
-  assert(!pstl_reduce_without_init_called);
-  pstl_reduce_without_init_called = true;
-  return *first;
-}
-
-bool pstl_sort_called = false;
-
-template <class, class RandomAccessIterator, class Comp>
-optional<__empty> __pstl_sort(TestBackend, RandomAccessIterator, RandomAccessIterator, Comp) {
-  assert(!pstl_sort_called);
-  pstl_sort_called = true;
-  return __empty{};
-}
-
-bool pstl_stable_sort_called = false;
-
-template <class, class RandomAccessIterator, class Comp>
-optional<__empty> __pstl_stable_sort(TestBackend, RandomAccessIterator, RandomAccessIterator, Comp) {
-  assert(!pstl_stable_sort_called);
-  pstl_stable_sort_called = true;
-  return __empty{};
-}
-
-bool pstl_unary_transform_reduce_called = false;
-
-template <class, class ForwardIterator, class T, class UnaryOperation, class BinaryOperation>
-T __pstl_transform_reduce(TestBackend, ForwardIterator, ForwardIterator, T v, UnaryOperation, BinaryOperation) {
-  assert(!pstl_unary_transform_reduce_called);
-  pstl_unary_transform_reduce_called = true;
-  return v;
-}
-
-bool pstl_binary_transform_reduce_called = false;
-
-template <class,
-          class ForwardIterator1,
-          class ForwardIterator2,
-          class T,
-          class BinaryOperation1,
-          class BinaryOperation2>
-typename std::iterator_traits<ForwardIterator1>::value_type __pstl_transform_reduce(
-    TestBackend, ForwardIterator1, ForwardIterator1, ForwardIterator2, T v, BinaryOperation1, BinaryOperation2) {
-  assert(!pstl_binary_transform_reduce_called);
-  pstl_binary_transform_reduce_called = true;
-  return v;
-}
-
-_LIBCPP_END_NAMESPACE_STD
-
-#include <algorithm>
-#include <cassert>
-#include <iterator>
-#include <numeric>
-
-template <>
-inline constexpr bool std::is_execution_policy_v<TestPolicy> = true;
-
-template <>
-struct std::__select_backend<TestPolicy> {
-  using type = TestBackend;
-};
-
-int main(int, char**) {
-  int a[]   = {1, 2};
-  auto pred = [](auto&&...) { return true; };
-
-  (void)std::any_of(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_any_of_called);
-  (void)std::all_of(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_all_of_called);
-  (void)std::none_of(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_none_of_called);
-  std::copy(TestPolicy{}, std::begin(a), std::end(a), std::begin(a));
-  assert(std::pstl_copy_called);
-  std::copy_n(TestPolicy{}, std::begin(a), 1, std::begin(a));
-  assert(std::pstl_copy_n_called);
-  (void)std::count(TestPolicy{}, std::begin(a), std::end(a), 0);
-  assert(std::pstl_count_called);
-  (void)std::count_if(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_count_if_called);
-  (void)std::fill(TestPolicy{}, std::begin(a), std::end(a), 0);
-  assert(std::pstl_fill_called);
-  (void)std::fill_n(TestPolicy{}, std::begin(a), std::size(a), 0);
-  assert(std::pstl_fill_n_called);
-  (void)std::find(TestPolicy{}, std::begin(a), std::end(a), 0);
-  assert(std::pstl_find_called);
-  (void)std::find_if(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_find_if_called);
-  (void)std::find_if_not(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_find_if_not_called);
-  (void)std::for_each(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_for_each_called);
-  (void)std::for_each_n(TestPolicy{}, std::begin(a), std::size(a), pred);
-  assert(std::pstl_for_each_n_called);
-  (void)std::generate(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_generate_called);
-  (void)std::generate_n(TestPolicy{}, std::begin(a), std::size(a), pred);
-  assert(std::pstl_generate_n_called);
-  (void)std::is_partitioned(TestPolicy{}, std::begin(a), std::end(a), pred);
-  assert(std::pstl_is_partitioned_called);
-  (void)std::move(TestPolicy{}, std::begin(a), std::end(a), std::begin(a));
-  assert(std::pstl_move_called);
-  (void)std::replace(TestPolicy{}, std::begin(a), std::end(a), 0, 0);
-  assert(std::pstl_replace_called);
-  (void)std::replace_if(TestPolicy{}, std::begin(a), std::end(a), pred, 0);
-  assert(std::pstl_replace_if_called);
-  (void)std::replace_copy(TestPolicy{}, std::begin(a), std::end(a), std::begin(a), 0, 0);
-  assert(std::pstl_replace_copy_called);
-  (void)std::replace_copy_if(TestPolicy{}, std::begin(a), std::end(a), std::begin(a), pred, 0);
-  assert(std::pstl_replace_copy_if_called);
-  (void)std::transform(TestPolicy{}, std::begin(a), std::end(a), std::begin(a), pred);
-  assert(std::pstl_unary_transform_called);
-  (void)std::transform(TestPolicy{}, std::begin(a), std::end(a), std::begin(a), std::begin(a), pred);
-  assert(std::pstl_unary_transform_called);
-  (void)std::reduce(TestPolicy{}, std::begin(a), std::end(a), 0, pred);
-  assert(std::pstl_reduce_with_init_called);
-  (void)std::reduce(TestPolicy{}, std::begin(a), std::end(a));
-  assert(std::pstl_reduce_without_init_called);
-  (void)std::rotate_copy(TestPolicy{}, std::begin(a), std::begin(a), std::end(a), std::begin(a));
-  assert(std::pstl_rotate_copy_called);
-  (void)std::sort(TestPolicy{}, std::begin(a), std::end(a));
-  assert(std::pstl_sort_called);
-  (void)std::stable_sort(TestPolicy{}, std::begin(a), std::end(a));
-  assert(std::pstl_stable_sort_called);
-  (void)std::transform_reduce(TestPolicy{}, std::begin(a), std::end(a), 0, pred, pred);
-  assert(std::pstl_unary_transform_reduce_called);
-  (void)std::transform_reduce(TestPolicy{}, std::begin(a), std::end(a), std::begin(a), 0, pred, pred);
-  assert(std::pstl_binary_transform_reduce_called);
-
-  return 0;
-}
diff --git a/libcxx/test/libcxx/transitive_includes/cxx23.csv b/libcxx/test/libcxx/transitive_includes/cxx23.csv
index 62d931c0eebad..4ef83c5133e1f 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx23.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx23.csv
@@ -401,12 +401,12 @@ numeric cstddef
 numeric cstdint
 numeric cstring
 numeric ctime
-numeric execution
 numeric initializer_list
 numeric limits
 numeric new
 numeric optional
 numeric ratio
+numeric tuple
 numeric version
 optional compare
 optional cstddef
diff --git a/libcxx/test/libcxx/transitive_includes/cxx26.csv b/libcxx/test/libcxx/transitive_includes/cxx26.csv
index f68249aeec78c..fbb2c7620fd99 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx26.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx26.csv
@@ -424,12 +424,12 @@ numeric cstddef
 numeric cstdint
 numeric cstring
 numeric ctime
-numeric execution
 numeric initializer_list
 numeric limits
 numeric new
 numeric optional
 numeric ratio
+numeric tuple
 numeric version
 optional compare
 optional cstddef

>From dfa9fb2b81f88d53ca8092e76a3fbf0b12dee062 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 30 May 2024 11:09:00 -0700
Subject: [PATCH 2/8] Try to fix modulemap

---
 libcxx/include/module.modulemap | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/libcxx/include/module.modulemap b/libcxx/include/module.modulemap
index 51901025a3f9f..a207dabe3cb2a 100644
--- a/libcxx/include/module.modulemap
+++ b/libcxx/include/module.modulemap
@@ -1576,10 +1576,22 @@ module std_private_numeric_transform_inclusive_scan [system] { header "__numeric
 module std_private_numeric_transform_reduce         [system] { header "__numeric/transform_reduce.h" }
 
 module std_private_pstl_backend_fwd                [system] { header "__pstl/backend_fwd.h" }
-module std_private_pstl_backends_default           [system] { header "__pstl/backends/default.h" }
-module std_private_pstl_backends_libdispatch       [system] { header "__pstl/backends/libdispatch.h" }
-module std_private_pstl_backends_serial            [system] { header "__pstl/backends/serial.h" }
-module std_private_pstl_backends_std_thread        [system] { header "__pstl/backends/std_thread.h" }
+module std_private_pstl_backends_default           [system] {
+  header "__pstl/backends/default.h"
+  export *
+}
+module std_private_pstl_backends_libdispatch       [system] {
+  header "__pstl/backends/libdispatch.h"
+  export *
+}
+module std_private_pstl_backends_serial            [system] {
+  header "__pstl/backends/serial.h"
+  export *
+}
+module std_private_pstl_backends_std_thread        [system] {
+  header "__pstl/backends/std_thread.h"
+  export *
+}
 module std_private_pstl_cpu_algos_any_of           [system] { header "__pstl/cpu_algos/any_of.h" }
 module std_private_pstl_cpu_algos_cpu_traits       [system] { header "__pstl/cpu_algos/cpu_traits.h" }
 module std_private_pstl_cpu_algos_fill             [system] { header "__pstl/cpu_algos/fill.h" }

>From 37482e800096026ae63084c7060666f7bdbcb6f4 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 30 May 2024 11:23:28 -0700
Subject: [PATCH 3/8] Adjust transitive includes

---
 libcxx/test/libcxx/transitive_includes/cxx23.csv | 2 +-
 libcxx/test/libcxx/transitive_includes/cxx26.csv | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libcxx/test/libcxx/transitive_includes/cxx23.csv b/libcxx/test/libcxx/transitive_includes/cxx23.csv
index 4ef83c5133e1f..17b7e11edc1dc 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx23.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx23.csv
@@ -4,13 +4,13 @@ algorithm cstdint
 algorithm cstring
 algorithm ctime
 algorithm cwchar
-algorithm execution
 algorithm initializer_list
 algorithm iosfwd
 algorithm limits
 algorithm new
 algorithm optional
 algorithm ratio
+algorithm tuple
 algorithm version
 any cstddef
 any cstdint
diff --git a/libcxx/test/libcxx/transitive_includes/cxx26.csv b/libcxx/test/libcxx/transitive_includes/cxx26.csv
index fbb2c7620fd99..e09131b32b53b 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx26.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx26.csv
@@ -4,13 +4,13 @@ algorithm cstdint
 algorithm cstring
 algorithm ctime
 algorithm cwchar
-algorithm execution
 algorithm initializer_list
 algorithm iosfwd
 algorithm limits
 algorithm new
 algorithm optional
 algorithm ratio
+algorithm tuple
 algorithm version
 any cstddef
 any cstdint

>From f893a58cd4c2d5c239a451f22a28e5a6aeaacbed Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 30 May 2024 11:29:13 -0700
Subject: [PATCH 4/8] Fix modulemap

---
 libcxx/include/module.modulemap | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libcxx/include/module.modulemap b/libcxx/include/module.modulemap
index a207dabe3cb2a..c72a918deb606 100644
--- a/libcxx/include/module.modulemap
+++ b/libcxx/include/module.modulemap
@@ -1349,7 +1349,10 @@ module std_private_functional_invoke                     [system] {
 module std_private_functional_is_transparent             [system] { header "__functional/is_transparent.h" }
 module std_private_functional_mem_fn                     [system] { header "__functional/mem_fn.h" }
 module std_private_functional_mem_fun_ref                [system] { header "__functional/mem_fun_ref.h" }
-module std_private_functional_not_fn                     [system] { header "__functional/not_fn.h" }
+module std_private_functional_not_fn                     [system] {
+  header "__functional/not_fn.h"
+  export std_private_functional_perfect_forward
+}
 module std_private_functional_operations                 [system] { header "__functional/operations.h" }
 module std_private_functional_perfect_forward            [system] {
   header "__functional/perfect_forward.h"

>From 2217b1f908bcd8cb691a67557be9f55e1c372c2e Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 30 May 2024 16:19:33 -0700
Subject: [PATCH 5/8] More transitive include fixes

---
 libcxx/test/libcxx/transitive_includes/cxx17.csv | 3 ++-
 libcxx/test/libcxx/transitive_includes/cxx20.csv | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/libcxx/test/libcxx/transitive_includes/cxx17.csv b/libcxx/test/libcxx/transitive_includes/cxx17.csv
index 09252b7b7d2db..65365d2d14ab0 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx17.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx17.csv
@@ -7,7 +7,6 @@ algorithm cstdint
 algorithm cstdlib
 algorithm cstring
 algorithm cwchar
-algorithm execution
 algorithm initializer_list
 algorithm iosfwd
 algorithm iterator
@@ -16,6 +15,7 @@ algorithm memory
 algorithm new
 algorithm optional
 algorithm stdexcept
+algorithm tuple
 algorithm type_traits
 algorithm utility
 algorithm version
@@ -583,6 +583,7 @@ numeric iterator
 numeric limits
 numeric new
 numeric optional
+numeric tuple
 numeric type_traits
 numeric version
 optional atomic
diff --git a/libcxx/test/libcxx/transitive_includes/cxx20.csv b/libcxx/test/libcxx/transitive_includes/cxx20.csv
index ce4ccc3d11615..78da527521cda 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx20.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx20.csv
@@ -7,7 +7,6 @@ algorithm cstdint
 algorithm cstdlib
 algorithm cstring
 algorithm cwchar
-algorithm execution
 algorithm initializer_list
 algorithm iosfwd
 algorithm iterator
@@ -16,6 +15,7 @@ algorithm memory
 algorithm new
 algorithm optional
 algorithm stdexcept
+algorithm tuple
 algorithm type_traits
 algorithm utility
 algorithm version
@@ -594,6 +594,7 @@ numeric iterator
 numeric limits
 numeric new
 numeric optional
+numeric tuple
 numeric type_traits
 numeric version
 optional atomic

>From 315a2522b765c1370b566965cfc88260ec7676c8 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 30 May 2024 16:21:01 -0700
Subject: [PATCH 6/8] Fix serial backend

---
 libcxx/include/__pstl/backends/serial.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libcxx/include/__pstl/backends/serial.h b/libcxx/include/__pstl/backends/serial.h
index 8f9fd76555b49..9a5d8d7d08fa9 100644
--- a/libcxx/include/__pstl/backends/serial.h
+++ b/libcxx/include/__pstl/backends/serial.h
@@ -167,7 +167,6 @@ struct __transform_reduce_binary<__serial_backend_tag, _ExecutionPolicy> {
         std::move(__first1),
         std::move(__last1),
         std::move(__first2),
-        std::move(__last2),
         __init,
         std::forward<_BinaryOperation1>(__reduce),
         std::forward<_BinaryOperation2>(__transform));

>From 2c02890632d7cb7a27b042159ff2831eaa857d42 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Mon, 3 Jun 2024 12:55:04 -0400
Subject: [PATCH 7/8] More fixes for serial backend

---
 libcxx/include/__pstl/backends/serial.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/libcxx/include/__pstl/backends/serial.h b/libcxx/include/__pstl/backends/serial.h
index 9a5d8d7d08fa9..99853068f9292 100644
--- a/libcxx/include/__pstl/backends/serial.h
+++ b/libcxx/include/__pstl/backends/serial.h
@@ -18,6 +18,7 @@
 #include <__config>
 #include <__numeric/transform_reduce.h>
 #include <__pstl/backend_fwd.h>
+#include <__pstl/configuration_fwd.h>
 #include <__utility/empty.h>
 #include <__utility/forward.h>
 #include <__utility/move.h>
@@ -74,14 +75,14 @@ struct __merge<__serial_backend_tag, _ExecutionPolicy> {
       _ForwardIterator1 __last1,
       _ForwardIterator2 __first2,
       _ForwardIterator2 __last2,
-      _ForwardOutIterator __out,
+      _ForwardOutIterator __outit,
       _Comp&& __comp) const noexcept {
     return std::merge(
         std::move(__first1),
         std::move(__last1),
         std::move(__first2),
         std::move(__last2),
-        std::move(__out),
+        std::move(__outit),
         std::forward<_Comp>(__comp));
   }
 };
@@ -92,6 +93,7 @@ struct __stable_sort<__serial_backend_tag, _ExecutionPolicy> {
   _LIBCPP_HIDE_FROM_ABI optional<__empty>
   operator()(_Policy&&, _RandomAccessIterator __first, _RandomAccessIterator __last, _Comp&& __comp) const noexcept {
     std::stable_sort(std::move(__first), std::move(__last), std::forward<_Comp>(__comp));
+    return __empty{};
   }
 };
 
@@ -99,9 +101,10 @@ template <class _ExecutionPolicy>
 struct __transform<__serial_backend_tag, _ExecutionPolicy> {
   template <class _Policy, class _ForwardIterator, class _ForwardOutIterator, class _UnaryOperation>
   _LIBCPP_HIDE_FROM_ABI optional<_ForwardOutIterator> operator()(
-      _Policy&&, _ForwardIterator __first, _ForwardIterator __last, _ForwardOutIterator __out, _UnaryOperation&& __op)
+      _Policy&&, _ForwardIterator __first, _ForwardIterator __last, _ForwardOutIterator __outit, _UnaryOperation&& __op)
       const noexcept {
-    return std::transform(std::move(__first), std::move(__last), std::move(__out), std::forward<_UnaryOperation>(__op));
+    return std::transform(
+        std::move(__first), std::move(__last), std::move(__outit), std::forward<_UnaryOperation>(__op));
   }
 };
 
@@ -117,13 +120,13 @@ struct __transform_binary<__serial_backend_tag, _ExecutionPolicy> {
              _ForwardIterator1 __first1,
              _ForwardIterator1 __last1,
              _ForwardIterator2 __first2,
-             _ForwardOutIterator __out,
+             _ForwardOutIterator __outit,
              _BinaryOperation&& __op) const noexcept {
     return std::transform(
         std::move(__first1),
         std::move(__last1),
         std::move(__first2),
-        std::move(__out),
+        std::move(__outit),
         std::forward<_BinaryOperation>(__op));
   }
 };
@@ -135,13 +138,13 @@ struct __transform_reduce<__serial_backend_tag, _ExecutionPolicy> {
   operator()(_Policy&&,
              _ForwardIterator __first,
              _ForwardIterator __last,
-             _Tp const& __init,
+             _Tp __init,
              _BinaryOperation&& __reduce,
              _UnaryOperation&& __transform) const noexcept {
     return std::transform_reduce(
         std::move(__first),
         std::move(__last),
-        __init,
+        std::move(__init),
         std::forward<_BinaryOperation>(__reduce),
         std::forward<_UnaryOperation>(__transform));
   }
@@ -160,14 +163,14 @@ struct __transform_reduce_binary<__serial_backend_tag, _ExecutionPolicy> {
       _ForwardIterator1 __first1,
       _ForwardIterator1 __last1,
       _ForwardIterator2 __first2,
-      _Tp const& __init,
+      _Tp __init,
       _BinaryOperation1&& __reduce,
       _BinaryOperation2&& __transform) const noexcept {
     return std::transform_reduce(
         std::move(__first1),
         std::move(__last1),
         std::move(__first2),
-        __init,
+        std::move(__init),
         std::forward<_BinaryOperation1>(__reduce),
         std::forward<_BinaryOperation2>(__transform));
   }

>From 67ed3ac4f29bd1e8df17843c113f8e7b5a0feef2 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Mon, 3 Jun 2024 14:19:18 -0400
Subject: [PATCH 8/8] Fix benchmarks

---
 libcxx/benchmarks/algorithms/pstl.stable_sort.bench.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libcxx/benchmarks/algorithms/pstl.stable_sort.bench.cpp b/libcxx/benchmarks/algorithms/pstl.stable_sort.bench.cpp
index 9357b870bece6..72541f70640f5 100644
--- a/libcxx/benchmarks/algorithms/pstl.stable_sort.bench.cpp
+++ b/libcxx/benchmarks/algorithms/pstl.stable_sort.bench.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include <algorithm>
+#include <execution>
 
 #include "common.h"
 



More information about the libcxx-commits mailing list