[llvm] [AMDGPU] Correctly insert s_nops for dst forwarding hazard (PR #100276)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 16 12:07:21 PDT 2024
================
@@ -875,13 +875,54 @@ GCNHazardRecognizer::checkVALUHazardsHelper(const MachineOperand &Def,
return DataIdx >= 0 &&
TRI->regsOverlap(MI.getOperand(DataIdx).getReg(), Reg);
};
+
int WaitStatesNeededForDef =
VALUWaitStates - getWaitStatesSince(IsHazardFn, VALUWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForDef);
return WaitStatesNeeded;
}
+/// Dest sel forwarding issue occurs if additional logic is needed to swizzle /
+/// pack the computed value into correct bit position of the dest register. This
+/// occurs if we have SDWA with dst_sel != DWORD or if we have op_sel with
+/// dst_sel that is not aligned to the register. This function analayzes the \p
+/// MI and \returns an operand with dst forwarding issue, or nullptr if
+/// none exists.
+static const MachineOperand *
+getDstSelForwardingOperand(const MachineInstr &MI, const GCNSubtarget &ST) {
+ if (!SIInstrInfo::isVALU(MI))
+ return nullptr;
+
+ const SIInstrInfo *TII = ST.getInstrInfo();
+
+ unsigned Opcode = MI.getOpcode();
+
+ // There are three different types of instructions
+ // which produce forwarded dest: 1. SDWA with dst_sel != DWORD, 2. VOP3
+ // which write hi bits (e.g. op_sel[3] == 1), and 3. CVR_SR_FP8_F32 and
+ // CVT_SR_BF8_F32 with op_sel[3:2]
----------------
jrbyrnes wrote:
Confirmed: 1. this comment is correct and 2. d16_hi loads do not produce dst forwarding issue
Also, experimental results agree with 2.
https://github.com/llvm/llvm-project/pull/100276
More information about the llvm-commits
mailing list