[clang] [CUDA][HIP] Fix host/device context in concept (PR #67721)

Thu Oct 12 17:24:56 PDT 2023

================
@@ -176,3 +176,34 @@ Predefined Macros
    * - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
      - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++20 Concepts with HIP and CUDA
+--------------------------------
+
+In Clang, when working with HIP or CUDA, it's important to note that all constraints in C++20 concepts are assumed to be for the host side only. This behavior is consistent across both programming models, and developers should be aware of this assumption when writing code that utilizes C++20 concepts.
+
+Example:
+.. code-block:: c++
+
+   template <class T>
+   concept MyConcept = requires(T& obj) {
+     my_function(obj);  // Assumed to be a host-side requirement
----------------
zygoloid wrote:

If I understand correctly, normally a template is usable from either host or device (depending on whether it ends up calling any host-only or device-only function). This choice for concepts seems like it's going to be problematic for that model. Something as simple as:

```c++
template<Copyable T> T f(T x) { return x; }
```

... should really be callable on the host or device side if `T` is copyable on the host or device side, and using the host side in all cases will mean that things like the C++ `<complex>` or `<functional>` header may stop doing the right thing in some cases if/when they get extended to use concepts. And it seems like with this patch there's not anything that the authors of those headers can really do about it.

Perhaps it would be better for the host/device choice in a concept satisfaction check to depend on the context in which the concept is required to be satisfied (which I would imagine is what happened by chance before this patch), and for us to include the CUDA context as part of the constraint satisfaction cache key? That kind of direction seems like it'd give closer results to what we'd get from the split compilation model. I don't know if that actually works in general, though. For example, given:

```c++
__host__ X<T> host_global;
__device__ X<T> device_global;
```

... where `X` is a constrained template, what seems like it should happen here is that we take the `__host__` / `__device__` into account when concept-checking `X`'s template arguments, but I'd worry that we don't have the host/device information to hand when checking the concept satisfaction query for `X`.

More broadly, I think there'll be cases where a CUDA developer will want, from host code, to check whether a constraint would be satisfied on the device, and some mechanism for doing that seems useful. I think that *can* be done with the model I suggest above, by putting a kernel call inside a `requires` expression, but it seems awkward, so perhaps some syntax for explicitly evaluating a *concept-id* in a particular host/device would be useful.

But it definitely seems worthwhile to figure out what rule NVCC is using here.

https://github.com/llvm/llvm-project/pull/67721