[llvm] [AMDGPU] Correctly insert s_nops for implicit read of SDWA (PR #100276)

Jeffrey Byrnes via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 31 09:50:13 PDT 2024


================
@@ -875,13 +875,34 @@ GCNHazardRecognizer::checkVALUHazardsHelper(const MachineOperand &Def,
     return DataIdx >= 0 &&
            TRI->regsOverlap(MI.getOperand(DataIdx).getReg(), Reg);
   };
+
   int WaitStatesNeededForDef =
     VALUWaitStates - getWaitStatesSince(IsHazardFn, VALUWaitStates);
   WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForDef);
 
   return WaitStatesNeeded;
 }
 
+static const MachineOperand *
+getDstSelForwardingOperand(const MachineInstr &MI, const GCNSubtarget &ST) {
+  if (!SIInstrInfo::isVALU(MI))
+    return nullptr;
+
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  if (SIInstrInfo::isSDWA(MI)) {
+    if (auto *DstSel = TII->getNamedOperand(MI, AMDGPU::OpName::dst_sel))
+      if (DstSel->getImm() == AMDGPU::SDWA::DWORD)
+        return nullptr;
+  } else {
+    if (!AMDGPU::hasNamedOperand(MI.getOpcode(), AMDGPU::OpName::op_sel) ||
+        !(TII->getNamedOperand(MI, AMDGPU::OpName::src0_modifiers)->getImm() &
+          SISrcMods::DST_OP_SEL))
+      return nullptr;
+  }
----------------
jrbyrnes wrote:

I ran tests on MI300 with instructions in the following table. In version 1, each instruction was immediately followed by VALU RAW, in version 2, each instruction was immediately followed by s_nop then VALU RAW. Comparing these results to the expected value, I was able to determine if s_nop is needed:

| Instruction       | op_sel      | need s_nop |
| ----------------- | ----------- | ---------- |
| v_cvt_sr_fp8_f32  | [1,1,1,1]   |     yes    |
| v_cvt_sr_fp8_f32  | [1,1,1,0]   |     yes    |
| v_cvt_sr_fp8_f32  | [1,1,0,1]   |     yes    |
| v_cvt_sr_fp8_f32  | [1,1,0,0]   |     no     |
| v_cvt_sr_fp8_f32  | unspecified |     no     |
| v_cvt_pk_fp8_f32  | [1,1,1]     |     yes    |
| v_cvt_pk_fp8_f32  | [1,1,0]     |     no     |
| v_cvt_pk_fp8_f32  | unspecified |     no     |

v_cvt_sr_fp8_f32 op_sel requirements for dest forwarding issue are consistent with MI300_SP_MAS section 1.3.9.2  (op_sel[3:2] != 0)

v_cvt_pk_fp8_f32 op_sel requirements for dest forwarding issue are inconsistent with MI300_SP_MAS section 1.3.9.2 . Document states "VOP3 opcodes with op_sel[3] = 1 [don't have 4 cycle feedback]", but the control for dest bits for the instruction is op_sel[2]. op_sel[2] = 1 is the requirement for dest forwarding issue. Our TII calls already handle this situation as expected.

It seems there is only a dest forwarding issue if the instruction 1 write is not aligned to the register. If it is a partial write but aligned to the register then no swizzle / pack is needed and no dest forwarding issue.

https://github.com/llvm/llvm-project/pull/100276


More information about the llvm-commits mailing list