[llvm] [AMDGPU] Filter candidates of LiveRegOptimizer for profitable cases (PR #124624)

Wed Jan 29 00:48:32 PST 2025

================
@@ -125,8 +130,41 @@ class LiveRegOptimizer {
     return LK.first != TargetLoweringBase::TypeLegal;
   }
 
-  LiveRegOptimizer(Module &Mod, const GCNSubtarget &ST)
-      : Mod(Mod), DL(Mod.getDataLayout()), ST(ST),
+  // Filtering based on operation or its cost.
+  // If an operation incurs high enough cost or natively work on
+  // vector of illegal type, ie. v2i8, then it makes sense to try
+  // to avoid scalarizing across BB.
+  bool shouldReplaceBasedOnOp(Instruction *II) {
+    // Ignore pseudos
+    if (II->isDebugOrPseudoInst())
+      return false;
+
+    // Instruction Cost
+    const auto Cost = TTI.getInstructionCost(
+        II, TargetTransformInfo::TargetCostKind::TCK_SizeAndLatency);
+    LLVM_DEBUG(dbgs() << "shouldReplaceBasedOnOp: " << *II << " Cost=" << Cost
+                      << '\n';);
+    if (Cost >= 8)
+      return true;
+
+    // Intrinsics - assume they natively handle illegal type
+    if (dyn_cast<IntrinsicInst>(II))
+      return true;
+
+    // Stores
+    if (dyn_cast<StoreInst>(II))
+      return true;
+
+    // Shuffles
+    if (dyn_cast<ShuffleVectorInst>(II))
----------------
choikwa wrote:

I am mixed about what the ideal solution should be here and left it somewhat open-ended. Correct me if I'm wrong but few things to consider are:
1) If operation is known to be scalarized, LRO shoehorning Values back into packed VGPR adds overhead.
2) While whitelisting to only packed ops would work, list may be exhaustive and/or,
3) It seems LRO has other effects such as splitting live range, which reduces RP for Insts however unintentional.

Should we err on the side of allowing more candidates for LRO?

https://github.com/llvm/llvm-project/pull/124624