[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

Tue Jun 27 08:22:58 PDT 2023

jhuber6 added a comment.

So this is implementing the `stacksave` using `__kmpc_alloc_shared` instead? It makes sense since the OpenMP standard expects sharing for the stack. I wonder how this interfaces with `-fopenmp-cuda-mode`.

================
Comment at: clang/lib/CodeGen/CGDecl.cpp:1603
+    // deallocation call of __kmpc_free_shared() is emitted later.
+    if (getLangOpts().OpenMP && getTarget().getTriple().isAMDGCN()) {
+      // Emit call to __kmpc_alloc_shared() instead of the alloca.
----------------
Does NVPTX handle this already? If not, is there a compelling reason to exclude NVPTX? Otherwise we should check if we are the OpenMP device.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883