[all-commits] [llvm/llvm-project] 7aa22f: [mlir][gpu] Add 'cluster_size' attribute to gpu.su...

Tue Aug 20 10:37:24 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7aa22f013e24d20291aad745368ff907baa9dfa4
      https://github.com/llvm/llvm-project/commit/7aa22f013e24d20291aad745368ff907baa9dfa4
  Author: Andrea Faulds <andrea.faulds at amd.com>
  Date:   2024-08-20 (Tue, 20 Aug 2024)

  Changed paths:
    M mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
    M mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
    M mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
    M mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
    M mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
    M mlir/test/Dialect/GPU/canonicalize.mlir
    M mlir/test/Dialect/GPU/invalid.mlir
    M mlir/test/Dialect/GPU/subgroup-reduce-lowering.mlir

  Log Message:
  -----------
  [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)

This enables performing several reductions in parallel, each smaller
than the size of the subgroup.

One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications