[libcxx-commits] [libcxxabi] Adding Separate OpenMP Offloading Backend to `libcxx/include/__algorithm/pstl_backends` (PR #66968)

Mon Oct 30 13:27:44 PDT 2023

================
@@ -0,0 +1,81 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_OPENMP_BACKEND_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_OPENMP_BACKEND_H
+
+#include <__config>
+
+/*
+Combined OpenMP CPU and GPU Backend
+===================================
+Contrary to the CPU backends found in ./cpu_backends/, the OpenMP backend can
+target both CPUs and GPUs. The OpenMP standard defines that when offloading code
+to an accelerator, the compiler must generate a fallback code for execution on
+the host. Thereby, the backend works as a CPU backend if no targeted accelerator
+is available at execution time. The target regions can also be compiled directly
+for a CPU architecture, for instance by adding the command-line option
+`-fopenmp-targets=x86_64-pc-linux-gnu` in Clang.
+
+When is an Algorithm Offloaded?
+-------------------------------
+Only parallel algorithms with the parallel unsequenced execution policy are
+offloaded to the device. We cannot offload parallel algorithms with a parallel
+execution policy to GPUs because invocations executing in the same thread "are
+indeterminately sequenced with respect to each other" which we cannot guarantee
+on a GPU.
+
+The standard draft states that "the semantics [...] allow the implementation to
+fall back to sequential execution if the system cannot parallelize an algorithm
+invocation". If it is not deemed safe to offload the parallel algorithm to the
+device, we first fall back to a parallel unsequenced implementation from
+./cpu_backends. The CPU implementation may then fall back to sequential
+execution. In that way we strive to achieve the best possible performance.
+
+Further, "it is the caller's responsibility to ensure that the invocation does
+not introduce data races or deadlocks."
+
+Implicit Assumptions
+--------------------
+If the user provides a function pointer as an argument to a parallel algorithm,
----------------
ldionne wrote:

If you automatically map function pointers, this documentation should change.

https://github.com/llvm/llvm-project/pull/66968