[all-commits] [llvm/llvm-project] 849297: [AMDGPU][wmma] - Add tied wmma intrinsic (#69903)
Jessica Del via All-commits
all-commits at lists.llvm.org
Mon Oct 30 08:24:03 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 849297c97d9e87584cae7c83fcca9686f784d54a
https://github.com/llvm/llvm-project/commit/849297c97d9e87584cae7c83fcca9686f784d54a
Author: Jessica Del <50999226+OutOfCache at users.noreply.github.com>
Date: 2023-10-30 (Mon, 30 Oct 2023)
Changed paths:
M llvm/include/llvm/IR/IntrinsicsAMDGPU.td
M llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
M llvm/lib/Target/AMDGPU/VOP3PInstructions.td
M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wmma_32.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wmma_64.ll
M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma_32.ll
M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma_64.ll
Log Message:
-----------
[AMDGPU][wmma] - Add tied wmma intrinsic (#69903)
These new intrinsics, `amdgcn_wmma_tied_f16_16x16x16_f16` and
`amdgcn_wmma_tied_f16_16x16x16_f16`,
explicitly tie the destination accumulator matrix to the input
accumulator matrix.
The `wmma_f16` and `wmma_bf16` intrinsics only write to 16-bit of the
32-bit destination VGPRs.
Which half is determined via the `op_sel` argument. The other half of
the destination registers remains unchanged.
In some cases however, we expect the destination to copy the other
halves from the input accumulator.
For instance, when packing two separate accumulator matrices into one.
In that case, the two matrices
are tied into the same registers, but separate halves. Then it is
important to copy the other matrix values
to the new destination.
More information about the All-commits
mailing list