[llvm] [AMDGPU] LiveRegOptimizer: fix PHI same-BB filter; consider i8/i16 binops on SDWA (PR #155800)

Wed Sep 10 07:53:24 PDT 2025

================
@@ -150,7 +184,12 @@ class LiveRegOptimizer {
       if (!CVisited.insert(CII).second)
         continue;
 
-      if (CII->getParent() == II->getParent() && !IsLookThru(II))
+      // Allow same-BB non-lookthrough users when the def is a PHI:
+      // loop headers frequently consume the carried value in the header block
+      // (e.g. byte-wise vector binops). We *do* want to coerce across the
+      // backedge in that common case to enable packed i32 + SDWA lowering.
+      if (CII->getParent() == II->getParent() && !IsLookThru(CII) &&
+          !isa<PHINode>(II))
----------------
michaelselehov wrote:

You’re right—there are two separate changes:

- The II → CII fix is a correctness bug: we must examine the user when pruning same-BB paths, otherwise the walk can terminate prematurely.
- The PHI exception is a small policy tweak: loop headers commonly consume the carried value in the header block via non-lookthrough ops (e.g., byte-wise vector binops). Without allowing that same-BB non-lookthrough use for PHI, the walk never reaches the profitable sink—even with a better cost model—so the regression remains.

https://github.com/llvm/llvm-project/pull/155800