[llvm] [AMDGPU] LiveRegOptimizer: fix PHI same-BB filter; consider i8/i16 binops on SDWA (PR #155800)

via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 28 03:54:56 PDT 2025


================
@@ -126,7 +126,37 @@ class LiveRegOptimizer {
     return LK.first != TargetLoweringBase::TypeLegal;
   }
 
-  bool isOpLegal(Instruction *I) { return isa<StoreInst, IntrinsicInst>(I); }
+  bool isOpLegal(Instruction *I) {
+    if (auto *Intr = dyn_cast<IntrinsicInst>(I))
+      return true; // FIXME: narrow to known native intrinsics (DOT/MFMA/tbuffer) or use TTI cost.
----------------
michaelselehov wrote:

Thanks, Matt — agreed that relying on TTI is the right direction here.

Before I reshuffle the functional block, what do you think about rewriting
`isOpLegal` as a profitability check and making TTI the primary signal, with a
very narrow SDWA safety-net to avoid re-introducing the regression we just fixed?

```// Rename for clarity: this is a profitability gate, not legality.
bool isProfitableSink(Instruction *I) {
  // 1) Always-profitable sinks (status quo).
  if (isa<StoreInst>(I) || isa<IntrinsicInst>(I))
    return true;

  // 2) SDWA safety-net: tiny vector binops that are known to lower well.
  // Keeps the v4i8/v2i16 loop-header case profitable on gfx9+.
  if (auto *BO = dyn_cast<BinaryOperator>(I))
    if (auto *VT = dyn_cast<VectorType>(BO->getType()))
      if (auto *Elt = VT->getElementType()) {
        std::optional<unsigned> Bits = VT->getPrimitiveSizeInBits();
        if (Elt->isIntegerTy() &&
            (Elt->getIntegerBitWidth() == 8 ||
             (Elt->getIntegerBitWidth() == 16 && ST.hasSDWA())) &&
            Bits && Bits->get() <= 32) {
          switch (BO->getOpcode()) {
          case Instruction::Add:
          case Instruction::Sub:
          case Instruction::And:
          case Instruction::Or:
          case Instruction::Xor:
            return true;
          default:
            break;
          }
        }
      }

  // 3) Default: use TTI cost (same spirit as the earlier change).
  auto C = TTI.getInstructionCost(
      I, TargetTransformInfo::TargetCostKind::TCK_SizeAndLatency);
  return C.isValid() && C.getValue() >= 8;
}```



https://github.com/llvm/llvm-project/pull/155800


More information about the llvm-commits mailing list