[llvm] [AMDGPU] Delete redundant s_or_b32 (PR #165261)

Wed Nov 5 08:58:55 PST 2025

================
@@ -10689,6 +10691,33 @@ bool SIInstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
     if (!optimizeSCC(Def, &CmpInstr, RI))
       return false;
 
+    // If s_or_b32 result, sY, is unused (i.e. it is effectively a 64-bit
+    // s_cmp_lg of a register pair) and the inputs are the hi and lo-halves of a
+    // 64-bit foldableSelect then delete s_or_b32 in the sequence:
+    //    sX = s_cselect_b64 (non-zero imm), 0
+    //    sLo = copy sX.sub0
+    //    sHi = copy sX.sub1
+    //    sY = s_or_b32 sLo, sHi
----------------
LU-JOHN wrote:

See llvm/test/CodeGen/AMDGPU/carryout-selection.ll for an example of why this optimization is needed.

https://github.com/llvm/llvm-project/pull/165261