[llvm] [AMDGPU] Fold uniform readfirstlane + cndmask (PR #70188)

Wed Oct 25 03:35:27 PDT 2023

================
@@ -1400,6 +1401,88 @@ bool SIFoldOperands::tryFoldFoldableCopy(
   return Changed;
 }
 
+// Try to fold the following pattern:
+//    s_cselect s[2:3], K, 0          ; K has LSB set. Usually it's +-1.
+//    v_cndmask v0, 0, +-1, s[2:3]
+//    v_readfirstlane s0, v0
+//
+// into (for example)
+//
+//    s_cselect s[2:3], K, 0
+//    s_bfe_u64 s0, s[2:3], 0x10000
+bool SIFoldOperands::tryFoldUniformReadFirstLaneCndMask(
+    MachineInstr &MI) const {
+  if (MI.getOpcode() != AMDGPU::V_READFIRSTLANE_B32)
+    return false;
+
+  MachineInstr *RFLSrc = MRI->getVRegDef(MI.getOperand(1).getReg());
+  // We can also have the following pattern:
+  //
+  // %2:vreg_64 = REG_SEQUENCE %X:vgpr_32, sub0, %1:sreg_32, sub1
+  // %3:sgpr_32 = V_READFIRSTLANE_B32 %2.sub0:vreg_64
+  //
+  // In this case we dig into %X or %Y depending on which sub register
+  // the V_READFIRSTLANE accesses.
+  if (RFLSrc->isRegSequence()) {
----------------
arsenm wrote:

Until we ban undef uses in all SSA MIR, you have to guard against null RFLSrc 

https://github.com/llvm/llvm-project/pull/70188