[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions
Matt Arsenault via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jun 28 04:38:18 PDT 2023
arsenm added inline comments.
================
Comment at: clang/lib/CodeGen/CGDecl.cpp:1603
+ // deallocation call of __kmpc_free_shared() is emitted later.
+ if (getLangOpts().OpenMP && getTarget().getTriple().isAMDGCN()) {
+ // Emit call to __kmpc_alloc_shared() instead of the alloca.
----------------
ABataev wrote:
> doru1004 wrote:
> > jhuber6 wrote:
> > > ABataev wrote:
> > > > OpenMPIsDevice?
> > > Does NVPTX handle this already? If not, is there a compelling reason to exclude NVPTX? Otherwise we should check if we are the OpenMP device.
> > Does NVPTX support dynamic allocas?
> It does not matter here, it depends on the runtime library implementations. The compiler just shall provide proper runtime calls emission, everything else is part of the runtime support.
I think I heard recent ptx introdced new instructions for it. amdgpu codegen just happens to be broken because we don't properly restore the stack afterwards. When I added the support we had no way of testing (and still don't really, __builtin_alloca doesn't handle non-0 stack address space correctly)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D153883/new/
https://reviews.llvm.org/D153883
More information about the cfe-commits
mailing list