[llvm] [AMDGPU] Correctly insert s_nops for implicit read of SDWA (PR #100276)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 1 08:47:30 PDT 2024
================
@@ -875,13 +875,34 @@ GCNHazardRecognizer::checkVALUHazardsHelper(const MachineOperand &Def,
return DataIdx >= 0 &&
TRI->regsOverlap(MI.getOperand(DataIdx).getReg(), Reg);
};
+
int WaitStatesNeededForDef =
VALUWaitStates - getWaitStatesSince(IsHazardFn, VALUWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForDef);
return WaitStatesNeeded;
}
+static const MachineOperand *
+getDstSelForwardingOperand(const MachineInstr &MI, const GCNSubtarget &ST) {
+ if (!SIInstrInfo::isVALU(MI))
+ return nullptr;
+
+ const SIInstrInfo *TII = ST.getInstrInfo();
+ if (SIInstrInfo::isSDWA(MI)) {
+ if (auto *DstSel = TII->getNamedOperand(MI, AMDGPU::OpName::dst_sel))
+ if (DstSel->getImm() == AMDGPU::SDWA::DWORD)
+ return nullptr;
+ } else {
+ if (!AMDGPU::hasNamedOperand(MI.getOpcode(), AMDGPU::OpName::op_sel) ||
+ !(TII->getNamedOperand(MI, AMDGPU::OpName::src0_modifiers)->getImm() &
+ SISrcMods::DST_OP_SEL))
+ return nullptr;
+ }
----------------
arsenm wrote:
> However, VOP2 16 bit instructions (e.g. v_add_u16) don't have dest preserve semantics so ecc isn't relevant.
This is more complicated than that, they made this a big mess. gfx8 always zeroed the top bits, gfx9 made it complicated, so some opcodes do, some don't, and some depend on op_sel. gfx10 made everything preserve. zeroesHigh16BitsOfDest is incomplete but theoretically should document which cases preserve or not. I think for example v_add_u16 in the vop3 form supports op_sel for the partial write behavior?
https://github.com/llvm/llvm-project/pull/100276
More information about the llvm-commits
mailing list