[all-commits] [llvm/llvm-project] 450df7: AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f...

Thu Nov 21 08:49:00 PST 2024

  Branch: refs/heads/users/arsenm/gfx950/add-mfma-scale-f8f6f4-instructions
  Home:   https://github.com/llvm/llvm-project
  Commit: 450df72deb921ab3e975143c518a88e4d63f33aa
      https://github.com/llvm/llvm-project/commit/450df72deb921ab3e975143c518a88e4d63f33aa
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2024-11-21 (Thu, 21 Nov 2024)

  Changed paths:
    M clang/include/clang/Basic/BuiltinsAMDGPU.def
    M clang/lib/CodeGen/CGBuiltin.cpp
    M clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
    M clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
    M clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
    M llvm/docs/AMDGPUUsage.rst
    M llvm/include/llvm/IR/IntrinsicsAMDGPU.td
    M llvm/lib/Target/AMDGPU/AMDGPUGISel.td
    M llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
    M llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
    M llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
    M llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
    M llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
    M llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCAsmInfo.cpp
    M llvm/lib/Target/AMDGPU/SIDefines.h
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/lib/Target/AMDGPU/SIInstrFormats.td
    M llvm/lib/Target/AMDGPU/SIInstrInfo.h
    M llvm/lib/Target/AMDGPU/SIInstrInfo.td
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.td
    M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
    M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
    M llvm/lib/Target/AMDGPU/VOP3PInstructions.td
    M llvm/lib/Target/AMDGPU/VOPInstructions.td
    M llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll
    A llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll
    A llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll
    M llvm/test/MC/AMDGPU/mai-gfx950.s
    M llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt
    A llvm/test/MC/Disassembler/AMDGPU/gfx950_vop3px2.txt
    M llvm/test/tools/llvm-mca/AMDGPU/gfx950.s

  Log Message:
  -----------
  AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.

The f8f6f4 intrinsics allow using different vector types, corresponding
to the 3 different format widths. These can use 4, 6 or 8 x i32 vectors
depending on if the format is fp4, fp6/bf6, or fp8/bf8. Verification
that the used format matches the vector type will come later.

This requires defining a separate pseudoinstruction for each register
width combination, so 9 pseudos per opcode. This makes disassembly ambiguous,
since now the opcode to use depends on the operand. Handle this by only
defining the _f8_f8 variant as a real instruction, and the disassembler
manually adjusts the opcode based on the format values later.

The clang builtin integer operands should probably be unsigned,
but all the other mfma intrinsics are using signed.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications