[all-commits] [llvm/llvm-project] 509341: [mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilati...

Thu Sep 14 15:00:41 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 5093413a5007b017a530edbeed42d32bfd18b126
      https://github.com/llvm/llvm-project/commit/5093413a5007b017a530edbeed42d32bfd18b126
  Author: Fabian Mora <fmora.dev at gmail.com>
  Date:   2023-09-14 (Thu, 14 Sep 2023)

  Changed paths:
    M mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
    M mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
    M mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
    M mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
    M mlir/include/mlir/Dialect/SparseTensor/Pipelines/Passes.h
    M mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
    M mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
    M mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp
    M mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
    M mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
    M mlir/lib/Target/LLVM/NVVM/Target.cpp
    M mlir/lib/Target/LLVM/ROCDL/Target.cpp
    M mlir/lib/Target/LLVMIR/Dialect/GPU/SelectObjectAttr.cpp
    M mlir/test/CMakeLists.txt
    M mlir/test/Dialect/GPU/module-to-binary-nvvm.mlir
    M mlir/test/Dialect/GPU/module-to-binary-rocdl.mlir
    M mlir/test/Dialect/GPU/ops.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/lit.local.cfg
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sm80-lt/sparse-matmul-2-4-lib-from-linalg.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sm80-lt/sparse-matmul-2-4-prune.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-gemm-lib.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-lib.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matvec-const.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matvec-lib.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matvec.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-mma-2-4-f16.mlir
    M mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
    M mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
    M mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir
    M mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f16.mlir
    M mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32-bare-ptr.mlir
    M mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-max.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-min.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-op.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-or.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-region.mlir
    M mlir/test/Integration/GPU/CUDA/all-reduce-xor.mlir
    M mlir/test/Integration/GPU/CUDA/async.mlir
    M mlir/test/Integration/GPU/CUDA/gpu-to-cubin.mlir
    M mlir/test/Integration/GPU/CUDA/lit.local.cfg
    M mlir/test/Integration/GPU/CUDA/multiple-all-reduce.mlir
    M mlir/test/Integration/GPU/CUDA/printf.mlir
    M mlir/test/Integration/GPU/CUDA/shuffle.mlir
    M mlir/test/Integration/GPU/CUDA/sm90/tma_load_128x64_swizzle128b.mlir
    M mlir/test/Integration/GPU/CUDA/sm90/tma_load_64x8_8x128_noswizzle.mlir
    M mlir/test/Integration/GPU/CUDA/sm90/transform-dialect/tma_load_64x8_8x128_noswizzle-transform.mlir
    M mlir/test/Integration/GPU/CUDA/two-modules.mlir
    M mlir/test/lib/Dialect/GPU/TestLowerToNVVM.cpp
    M mlir/test/lit.site.cfg.py.in
    M mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
    M mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp

  Log Message:
  -----------
  [mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220)

This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.