[llvm] [AMDGPU] Account for existing SDWA selections (PR #123221)
Frederik Harwath via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 17 06:06:19 PST 2025
================
@@ -102,12 +105,47 @@ class SDWAOperand {
virtual MachineInstr *potentialToConvert(const SIInstrInfo *TII,
const GCNSubtarget &ST,
SDWAOperandsMap *PotentialMatches = nullptr) = 0;
- virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) = 0;
+ virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+ bool CombineSelections = false) = 0;
MachineOperand *getTargetOperand() const { return Target; }
MachineOperand *getReplacedOperand() const { return Replaced; }
MachineInstr *getParentInst() const { return Target->getParent(); }
+ /// Fold a \p FoldedOp SDWA selection into an \p ExistingOp existing SDWA
+ /// selection. If the selections are compatible, return the combined
+ /// selection, otherwise return a nullopt. For example, if we have existing
+ /// BYTE_0 Sel and are attempting to fold WORD_1 Sel:
+ /// BYTE_0 Sel (WORD_1 Sel (%X)) -> BYTE_2 Sel (%X)
----------------
frederik-h wrote:
> So this patch is implementing folding of SDWA into SDWA?
I would not put it this way. The pass does already handle existing SDWA instructions, see e.g. https://github.com/llvm/llvm-project/blob/48d0ef1a07993139e1acf65910704255443103a5/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp#L1032. It does not account for the selections correctly. This PR changes the handling to skip the creation of a new SDWA instruction and to combine the selections. It's more of a bugfix and not a new feature.
> In general that seems like something we should avoid the need for up-front.
But does this mean that any handling of pre-existing SDWA instructions should be removed from [here](https://github.com/llvm/llvm-project/blob/48d0ef1a07993139e1acf65910704255443103a5/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp#L1032), e.g. by inserting a `assert (!TII->isSDWA(Opcode))` (or using a more graceful exit)?
> In terms of avoiding miscompiles, can we do a simpler patch which just avoids the miscompile by skipping SDWA+SDWA cases first?
I will consider this and discuss it with @jrbyrnes, but my impression is that SIPeepholeSDWA should be enabled to handle this case correctly.
https://github.com/llvm/llvm-project/pull/123221
More information about the llvm-commits
mailing list