[llvm] [AMDGPU] SIPeepholeSDWA: Handle V_CNDMASK_B32_e64 (PR #137930)

Wed Apr 30 06:05:34 PDT 2025

================
@@ -1061,6 +1062,79 @@ void SIPeepholeSDWA::pseudoOpConvertToVOP2(MachineInstr &MI,
   MISucc.substituteRegister(CarryIn->getReg(), TRI->getVCC(), 0, *TRI);
 }
 
+static unsigned getVCmpEqOpcode(unsigned Bits) {
+  if (Bits == 64)
+    return AMDGPU::V_CMP_EQ_U64_e64;
+  if (Bits == 32)
+    return AMDGPU::V_CMP_EQ_U32_e64;
+  if (Bits == 16)
+    return AMDGPU::V_CMP_EQ_U16_e64;
+
+  llvm_unreachable("Unexpected register bit width.");
+};
+
+/// Try to convert an \p MI in VOP3 which takes an src2 carry-in
+/// operand into the corresponding VOP2 form which expects the
+/// argument in VCC. To this end, either try to change the definition
+/// of the carry-in operand to write to VCC or add an instruction that
+/// copies from the carry-in to VCC.  The conversion will only be
+/// applied if \p MI can be shrunk to VOP2 and if VCC can be proven to
+/// be dead before \p MI.
+void SIPeepholeSDWA::convertToImplicitVcc(MachineInstr &MI,
+                                          const GCNSubtarget &ST) const {
+  assert(MI.getOpcode() == AMDGPU::V_CNDMASK_B32_e64);
+
+  MCRegister Vcc = TRI->getVCC();
+  // FIXME Conversion introduces implicit vcc_hi use
+  if (Vcc == AMDGPU::VCC_LO)
+    return;
----------------
frederik-h wrote:

Thanks for the pointer to the function. I had to add the fix to the function that creates the sdwa instruction instead of the new conversion function.

https://github.com/llvm/llvm-project/pull/137930