[all-commits] [llvm/llvm-project] 849297: [AMDGPU][wmma] - Add tied wmma intrinsic (#69903)

Mon Oct 30 08:24:03 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 849297c97d9e87584cae7c83fcca9686f784d54a
      https://github.com/llvm/llvm-project/commit/849297c97d9e87584cae7c83fcca9686f784d54a
  Author: Jessica Del <50999226+OutOfCache at users.noreply.github.com>
  Date:   2023-10-30 (Mon, 30 Oct 2023)

  Changed paths:
    M llvm/include/llvm/IR/IntrinsicsAMDGPU.td
    M llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
    M llvm/lib/Target/AMDGPU/VOP3PInstructions.td
    M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wmma_32.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wmma_64.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma_32.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma_64.ll

  Log Message:
  -----------
  [AMDGPU][wmma] - Add tied wmma intrinsic (#69903)

These new intrinsics, `amdgcn_wmma_tied_f16_16x16x16_f16` and
`amdgcn_wmma_tied_f16_16x16x16_f16`,
explicitly tie the destination accumulator matrix to the input
accumulator matrix.

The `wmma_f16` and `wmma_bf16` intrinsics only write to 16-bit of the
32-bit destination VGPRs.
Which half is determined via the `op_sel` argument. The other half of
the destination registers remains unchanged.

In some cases however, we expect the destination to copy the other
halves from the input accumulator.
For instance, when packing two separate accumulator matrices into one.
In that case, the two matrices
are tied into the same registers, but separate halves. Then it is
important to copy the other matrix values
to the new destination.