[all-commits] [llvm/llvm-project] 371366: [mlir][nvgpu] add simple pipelining for shared mem...

Mon Jul 17 07:29:28 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 371366ce27303e0b949aeb643b973a1a110da469
      https://github.com/llvm/llvm-project/commit/371366ce27303e0b949aeb643b973a1a110da469
  Author: Alex Zinenko <zinenko at google.com>
  Date:   2023-07-17 (Mon, 17 Jul 2023)

  Changed paths:
    M mlir/include/mlir/Dialect/NVGPU/TransformOps/NVGPUTransformOps.h
    M mlir/include/mlir/Dialect/NVGPU/TransformOps/NVGPUTransformOps.td
    M mlir/include/mlir/Dialect/SCF/Transforms/Patterns.h
    M mlir/include/mlir/Dialect/SCF/Transforms/Transforms.h
    M mlir/lib/Dialect/NVGPU/TransformOps/CMakeLists.txt
    M mlir/lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp
    M mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
    A mlir/test/Dialect/NVGPU/transform-pipeline-shared.mlir
    M utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

  Log Message:
  -----------
  [mlir][nvgpu] add simple pipelining for shared memory copies

Add a simple transform operation to the NVGPU extension that performs
software pipelining of copies to shared memory. The functionality is
extremely minimalistic in this version and only supports copies from
global to shared memory inside an `scf.for` loop with either
`vector.transfer` or `nvgpu.device_async_copy` operations when
pipelining preconditions are already satisfied in the IR. This is the
minimally useful version that uses the more general loop pipeliner in an
NVGPU-specific way. Further extensions and orthogonalizations will be
necessary.

This required a change to the loop pipeliner itself to properly
propagate errors should the predicate generator fail.

This is loosely inspired from the vesion in IREE, but has less unsafe
assumptions and more principled way of communicating decisions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155223