[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

Wed Jun 28 04:38:18 PDT 2023

arsenm added inline comments.

================
Comment at: clang/lib/CodeGen/CGDecl.cpp:1603
+    // deallocation call of __kmpc_free_shared() is emitted later.
+    if (getLangOpts().OpenMP && getTarget().getTriple().isAMDGCN()) {
+      // Emit call to __kmpc_alloc_shared() instead of the alloca.
----------------
ABataev wrote:
> doru1004 wrote:
> > jhuber6 wrote:
> > > ABataev wrote:
> > > > OpenMPIsDevice?
> > > Does NVPTX handle this already? If not, is there a compelling reason to exclude NVPTX? Otherwise we should check if we are the OpenMP device.
> > Does NVPTX support dynamic allocas?
> It does not matter here, it depends on the runtime library implementations. The compiler just shall provide proper runtime calls emission, everything else is part of the runtime support.
I think I heard recent ptx introdced new instructions for it. amdgpu codegen just happens to be broken because we don't properly restore the stack afterwards. When I added the support we had no way of testing (and still don't really, __builtin_alloca doesn't handle non-0 stack address space correctly)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883