[llvm-branch-commits] X86: Add X86TTIImpl::isProfitableToSinkOperands hook for immediate operands. (PR #141326)

Wed May 28 10:19:22 PDT 2025

================
@@ -7170,16 +7165,31 @@ bool X86TTIImpl::isProfitableToSinkOperands(Instruction *I,
         II->getIntrinsicID() == Intrinsic::fshr)
       ShiftAmountOpNum = 2;
   }
-
   if (ShiftAmountOpNum == -1)
     return false;
+  auto *ShiftAmount = &I->getOperandUse(ShiftAmountOpNum);
 
-  auto *Shuf = dyn_cast<ShuffleVectorInst>(I->getOperand(ShiftAmountOpNum));
+  // A uniform shift amount in a vector shift or funnel shift may be much
+  // cheaper than a generic variable vector shift, so make that pattern visible
+  // to SDAG by sinking the shuffle instruction next to the shift.
+  auto *Shuf = dyn_cast<ShuffleVectorInst>(ShiftAmount);
   if (Shuf && getSplatIndex(Shuf->getShuffleMask()) >= 0 &&
       isVectorShiftByScalarCheap(I->getType())) {
-    Ops.push_back(&I->getOperandUse(ShiftAmountOpNum));
+    Ops.push_back(ShiftAmount);
     return true;
   }
 
+  // Casts taking a constant expression (generally derived from a global
+  // variable address) as an operand are profitable to sink because they appear
+  // as subexpressions in the instruction sequence generated by the
+  // LowerTypeTests pass which is expected to pattern match to the rotate
+  // instruction's immediate operand.
+  if (auto *CI = dyn_cast<CastInst>(ShiftAmount)) {
+    if (isa<ConstantExpr>(CI->getOperand(0))) {
+      Ops.push_back(ShiftAmount);
+      return true;
+    }
+  }
----------------
pcc wrote:

I would not generally expect normal user code to take the address of a global variable and do constant arithmetic on it in such a way that it wouldn't ultimately become an immediate operand to some instruction, so I didn't think it would be worth detecting that case.

I also gave the fshr idea more thought. Thank you for sending #141735. I will check if it improves or regresses the generated code. I will also check if we can avoid using zext (instead use ptrtoint to i64 as the argument to fshr) and avoid this issue entirely. I think there was some reason why this didn't work with shl/shr/or though.

https://github.com/llvm/llvm-project/pull/141326