[all-commits] [llvm/llvm-project] 51b925: [mlir][nvgpu] shared memory access optimization pass

Fri Jun 17 08:35:52 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 51b925df941a66349deff2467203acc200de5e78
      https://github.com/llvm/llvm-project/commit/51b925df941a66349deff2467203acc200de5e78
  Author: Christopher Bate <cbate at nvidia.com>
  Date:   2022-06-17 (Fri, 17 Jun 2022)

  Changed paths:
    M mlir/include/mlir/Dialect/NVGPU/CMakeLists.txt
    A mlir/include/mlir/Dialect/NVGPU/IR/CMakeLists.txt
    A mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
    A mlir/include/mlir/Dialect/NVGPU/IR/NVGPUDialect.h
    R mlir/include/mlir/Dialect/NVGPU/NVGPU.td
    R mlir/include/mlir/Dialect/NVGPU/NVGPUDialect.h
    A mlir/include/mlir/Dialect/NVGPU/Passes.h
    A mlir/include/mlir/Dialect/NVGPU/Passes.td
    A mlir/include/mlir/Dialect/NVGPU/Transforms/Transforms.h
    M mlir/include/mlir/InitAllDialects.h
    M mlir/include/mlir/InitAllPasses.h
    M mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
    M mlir/lib/Conversion/VectorToGPU/NvGpuSupport.cpp
    M mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
    M mlir/lib/Dialect/NVGPU/CMakeLists.txt
    M mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
    A mlir/lib/Dialect/NVGPU/Transforms/CMakeLists.txt
    A mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp
    A mlir/lib/Dialect/NVGPU/Transforms/PassDetail.h
    A mlir/test/Dialect/NVGPU/optimize-shared-memory.mlir
    M utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

  Log Message:
  -----------
  [mlir][nvgpu] shared memory access optimization pass

This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a  memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.

Differential Revision: https://reviews.llvm.org/D127457