[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

Gheorghe-Teodor Bercea via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 27 09:32:40 PDT 2023


doru1004 added inline comments.


================
Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1085
   }
-  for (const auto *VD : I->getSecond().EscapedVariableLengthDecls) {
-    // Use actual memory size of the VLA object including the padding
----------------
doru1004 wrote:
> ABataev wrote:
> > jhuber6 wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > Why this code is removed?
> > > > I could not understand why this code is here in the first place since it doesn't seem that it could ever work correctly (and it doesn't seem to be covered by any existing tests). Maybe I'm wrong but that was my understanding of it. So what seems to happen is that this code attempts to emit a kmpc_alloc_shared before the actual size calculation is emitted. So if the VLA size is something that the user defines such as `int N = 10;` then that code will not have been emitted at this point. When the expression computing the size of the VLA uses `N`, the code that is deleted here would just fail to find the VLA size in the attempt to emit the kmpc_alloc_shared. The emission of the VLA as kmpc_alloc_shared needs to happen after the expression of the size is emitted.
> > > I'm pretty sure I was the one that wrote this code, and at the time I don't recall it really working. I remember there was something else that expected this to be here, but for what utility I do not recall. VLAs were never tested or used.
> > They are tested, check test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp for example, where it captures VLA implicitly. I assume this should not be AMDGCN specific.
> Oh I see so this code path would cover the case when the VLA is defined outside the target region? I'm surprised I haven't seen any lit test fails for AMD GPUs, maybe this kind of test only exists for NVPTX. I'll add a test for AMD GPUs in that case.
Edit: the VLA is defined outside the target region => the VLA //size// is defined outside the target region


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883



More information about the cfe-commits mailing list