[llvm] [AMDGPU] Correctly insert s_nops for implicit read of SDWA (PR #100276)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 30 14:00:12 PDT 2024
================
@@ -875,13 +875,34 @@ GCNHazardRecognizer::checkVALUHazardsHelper(const MachineOperand &Def,
return DataIdx >= 0 &&
TRI->regsOverlap(MI.getOperand(DataIdx).getReg(), Reg);
};
+
int WaitStatesNeededForDef =
VALUWaitStates - getWaitStatesSince(IsHazardFn, VALUWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForDef);
return WaitStatesNeeded;
}
+static const MachineOperand *
+getDstSelForwardingOperand(const MachineInstr &MI, const GCNSubtarget &ST) {
+ if (!SIInstrInfo::isVALU(MI))
+ return nullptr;
+
+ const SIInstrInfo *TII = ST.getInstrInfo();
+ if (SIInstrInfo::isSDWA(MI)) {
+ if (auto *DstSel = TII->getNamedOperand(MI, AMDGPU::OpName::dst_sel))
+ if (DstSel->getImm() == AMDGPU::SDWA::DWORD)
+ return nullptr;
+ } else {
+ if (!AMDGPU::hasNamedOperand(MI.getOpcode(), AMDGPU::OpName::op_sel) ||
+ !(TII->getNamedOperand(MI, AMDGPU::OpName::src0_modifiers)->getImm() &
+ SISrcMods::DST_OP_SEL))
+ return nullptr;
+ }
----------------
jrbyrnes wrote:
Two pieces to this hazard, the instruction that has dest forwarding and the consumer of the fowarded dest.
Different types of instructions with dest forwarding:
f1. SDWA w/ dest_sel
f2. VOP3 with op_sel[3]
f3. CVR_SR_FP8_F32 || CVR_SR_BF8_F32 w/ op_sel[2:3] != 0
And there are three classes of consumers:
c1. RAW
c2a. WAW (dest reg)
c2b. WAW (ECC)
c2a is sdwa w/ UNUSED_PRESERVE semantics. This results in incorrect bits in dest reg when the second partial write does not overlap completely with the first.
c2b is ecc only which is always enabled for MI300. For parity check, ecc will read the dst for compare after modifying the value. This occurs when we partial write to the same dest, and we preserve the initial value. For example, VOP3A 16 bit instructions with op_sel[3] have dest preserve semantics. However, VOP2 16 bit instructions (e.g.) don't have dest preserve semantics v_add_u16 so ecc isn't relevant. AFAICT, this only effects result of ecc.
"should it just consider any access to the register as the write"? For MI300 I don't think this is necessary. The trickiest is c2b. For MI300, VOP3A and all the loads with dest preserve (thanks for pointing this out) should be sufficient to represent the class. That said, maybe the better approach is to simply whitelist VOP1 and VOP2 WAW and insert s_nop for all others instead of trying to encode all these conditions.
https://github.com/llvm/llvm-project/pull/100276
More information about the llvm-commits
mailing list