[all-commits] [llvm/llvm-project] e1da62: [MLIR][GPU] Define gpu.printf op and its lowerings

Thu Dec 9 07:54:43 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e1da62910e140cf45eafec64193c813e79796f05
      https://github.com/llvm/llvm-project/commit/e1da62910e140cf45eafec64193c813e79796f05
  Author: Krzysztof Drewniak <Krzysztof.Drewniak at amd.com>
  Date:   2021-12-09 (Thu, 09 Dec 2021)

  Changed paths:
    M mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
    A mlir/include/mlir/Conversion/GPUToROCDL/Runtimes.h
    M mlir/include/mlir/Conversion/Passes.td
    M mlir/include/mlir/Dialect/GPU/GPUOps.td
    M mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
    M mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
    M mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
    M mlir/lib/Conversion/PassDetail.h
    M mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp
    M mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp
    A mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir
    A mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir
    M mlir/test/Dialect/GPU/ops.mlir
    A mlir/test/Integration/GPU/ROCM/printf.mlir

  Log Message:
  -----------
  [MLIR][GPU] Define gpu.printf op and its lowerings

- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448