[all-commits] [llvm/llvm-project] 18e161: [MLIR][NVVM] Introduction of the `wgmma.mma_async` Op

Wed Aug 9 14:08:16 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 18e161f9e15b036faf48bfd8813d9330e06e2ee3
      https://github.com/llvm/llvm-project/commit/18e161f9e15b036faf48bfd8813d9330e06e2ee3
  Author: Guray Ozen <guray.ozen at gmail.com>
  Date:   2023-08-09 (Wed, 09 Aug 2023)

  Changed paths:
    M mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
    M mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
    M mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
    A mlir/test/Conversion/NVVMToLLVM/invalid.mlir
    M mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir

  Log Message:
  -----------
  [MLIR][NVVM] Introduction of the `wgmma.mma_async` Op

This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability.

The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator:
	D = A * B + D 	: Result is added to the accumulator matrix D.
	D = A * B 		: The input from the accumulator matrix D is not utilized.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D157370