[llvm] [AMDGPU] Account for existing SDWA selections (PR #123221)

Sun Jan 19 13:38:25 PST 2025

================
@@ -102,12 +105,47 @@ class SDWAOperand {
   virtual MachineInstr *potentialToConvert(const SIInstrInfo *TII,
                                            const GCNSubtarget &ST,
                                            SDWAOperandsMap *PotentialMatches = nullptr) = 0;
-  virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) = 0;
+  virtual bool convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII,
+                             bool CombineSelections = false) = 0;
 
   MachineOperand *getTargetOperand() const { return Target; }
   MachineOperand *getReplacedOperand() const { return Replaced; }
   MachineInstr *getParentInst() const { return Target->getParent(); }
 
+  /// Fold a \p FoldedOp SDWA selection into an \p ExistingOp existing SDWA
+  /// selection. If the selections are compatible, return the combined
+  /// selection, otherwise return a nullopt. For example, if we have existing
+  /// BYTE_0 Sel and are attempting to fold WORD_1 Sel:
+  ///     BYTE_0 Sel (WORD_1 Sel (%X)) -> BYTE_2 Sel (%X)
----------------
jrbyrnes wrote:

> > In terms of avoiding miscompiles, can we do a simpler patch which just avoids the miscompile by skipping SDWA+SDWA cases first?

> I will consider this and discuss it with @jrbyrnes, but my impression is that SIPeepholeSDWA should be enabled to handle this case correctly.

I agree with both points; I think we should implement a simple patch to disallow SDWA->SDWA cases while working on the PR to actually handle them. I think such foldings won't be too uncommon for 8 bit code.

https://github.com/llvm/llvm-project/pull/123221