[all-commits] [llvm/llvm-project] 1ca772: [MLIR][GPU] Add NvGpu mma.sync path to the VectorT...

Fri May 20 08:44:29 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 1ca772ed951e6412ef006459b56ae9a21691a97c
      https://github.com/llvm/llvm-project/commit/1ca772ed951e6412ef006459b56ae9a21691a97c
  Author: Christopher Bate <cbate at nvidia.com>
  Date:   2022-05-20 (Fri, 20 May 2022)

  Changed paths:
    M mlir/include/mlir/Conversion/Passes.td
    M mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h
    M mlir/lib/Conversion/PassDetail.h
    M mlir/lib/Conversion/VectorToGPU/CMakeLists.txt
    A mlir/lib/Conversion/VectorToGPU/NvGpuSupport.cpp
    A mlir/lib/Conversion/VectorToGPU/NvGpuSupport.h
    M mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
    A mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir

  Log Message:
  -----------
  [MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass

This changes adds the option to lower to NvGpu dialect ops during the
VectorToGPU convsersion pass. Because this transformation reuses
existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is
not created. The option `use-nvgpu` is added to the VectorToGPU pass.
When this is true, the pass will attempt to convert slices rooted at
`vector.contract` operations into `nvgpu.mma.sync` ops, and
`vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or
one or more `vector.load` operations.  The specific data loaded will
depend on the thread id within a subgroup (warp). These index
calculations depend on data type and shape of the MMA op
according to the downstream PTX specification. The code for supporting
these details is separated into `NvGpuSupport.cpp|h`.

Differential Revision: https://reviews.llvm.org/D122940