[all-commits] [llvm/llvm-project] dcf4ca: [OpenMP][MLIR][OMPIRBuilder] Add a small optional ...

Fri Feb 23 13:59:52 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: dcf4ca558ce5c323ce9b0af93acc0c832024eb3c
      https://github.com/llvm/llvm-project/commit/dcf4ca558ce5c323ce9b0af93acc0c832024eb3c
  Author: agozillon <Andrew.Gozillon at amd.com>
  Date:   2024-02-23 (Fri, 23 Feb 2024)

  Changed paths:
    M llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
    M llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
    M llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
    A mlir/test/Target/LLVMIR/omptarget-constant-alloca-raise.mlir

  Log Message:
  -----------
  [OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818)

This patch seeks to add a mechanism to raise constant (not ConstantExpr
or runtime/dynamic) sized allocations into the entry block for select
functions that have been inserted into a list for processing. This
processing occurs during the finalize call, after OutlinedInfo regions
have completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for TargetOp generation in
the OpenMP MLIR dialect lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimization passes doing
some unintentional malformed optimizations for AMD kernels (unsure if it
occurs for other vendors). If the allocas are generated inside of the
kernel and are not in the entry block and are subsequently passed to a
function this can lead to required instructions being erased or
manipulated in a way that causes the kernel to run into a HSA access
error.

This fix is related to a series of problems found in:
https://github.com/llvm/llvm-project/issues/74603

This problem primarily presents itself for Flang's HLFIR AssignOp
currently, when utilised with a scalar temporary constant on the RHS and
a descriptor type on the LHS. It will generate a call to a runtime
function, wrap the RHS temporary in a newly allocated descriptor (an
llvm struct), and pass both the LHS and RHS descriptor into the runtime
function call. This will currently be
embedded into the middle of the target region in the user entry block,
which means the allocas are also embedded in the middle, which seems to
pose
issues when later passes are executed. This issue may present itself in
other HLFIR operations or unrelated operations that generate allocas as
a by product, but for the moment, this one test case is the only
scenario I've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive change set. The other two, that come to mind (but I've not
fully looked into, the former I tried a little with blocks but it had a
few issues I'd need to think through):

- Having a proper alloca only block (or region) generated for TargetOps
that we could merge into the entry block that's generated by
convertTarget's createOutlinedFunction.
- Or diverging a little from Clang's current target generation and using
the CodeExtractor to generate the user code as an outlined function
region invoked from the kernel we make, with our kernel arguments passed
into it. Similar to the current parallel generation. I am not sure how
well this would intermingle with the existing parallel generation though
that's layered in.

Both of these methods seem like quite a divergence from the current
status quo, which I am not entirely sure is merited for the small test
this change aims to fix.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications