[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

Wed Jun 28 06:55:03 PDT 2023

doru1004 added inline comments.

================
Comment at: clang/lib/CodeGen/CGDecl.cpp:1603
+    // deallocation call of __kmpc_free_shared() is emitted later.
+    if (getLangOpts().OpenMP && getTarget().getTriple().isAMDGCN()) {
+      // Emit call to __kmpc_alloc_shared() instead of the alloca.
----------------
arsenm wrote:
> ABataev wrote:
> > doru1004 wrote:
> > > jhuber6 wrote:
> > > > ABataev wrote:
> > > > > OpenMPIsDevice?
> > > > Does NVPTX handle this already? If not, is there a compelling reason to exclude NVPTX? Otherwise we should check if we are the OpenMP device.
> > > Does NVPTX support dynamic allocas?
> > It does not matter here, it depends on the runtime library implementations. The compiler just shall provide proper runtime calls emission, everything else is part of the runtime support.
> I think I heard recent ptx introdced new instructions for it. amdgpu codegen just happens to be broken because we don't properly restore the stack afterwards. When I added the support we had no way of testing (and still don't really, __builtin_alloca doesn't handle non-0 stack address space correctly)
If NVPTX supports that then there is no reason to have NVPTX avoid emitting allocas (i.e. the condition stays as it is right now) but I am willing to reach a consensus so please let me know what you would all prefer.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883