[llvm] [AMDGPU] LiveRegOptimizer: fix PHI same-BB filter; consider i8/i16 binops on SDWA (PR #155800)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 15 04:10:17 PDT 2025
================
@@ -126,7 +126,37 @@ class LiveRegOptimizer {
return LK.first != TargetLoweringBase::TypeLegal;
}
- bool isOpLegal(Instruction *I) { return isa<StoreInst, IntrinsicInst>(I); }
+ bool isOpLegal(Instruction *I) {
+ if (auto *Intr = dyn_cast<IntrinsicInst>(I))
+ return true; // FIXME: narrow to known native intrinsics (DOT/MFMA/tbuffer) or use TTI cost.
----------------
michaelselehov wrote:
Sorry, posted this reply in the wrong place. Reposting here.
@arsenm, I instrumented the TTI queries on gfx90a: `add <4 x i8>` comes out at cost 4 with TCK_SizeAndLatency (and likewise for getArithmeticInstrCost), which is below the previous profitability threshold (8). So switching to TTI-only would reintroduce the regression. I propose to keep the very narrow SDWA safety-net for v4i8/v2i16 (≤32b) here and look at improving AMDGPU TTI separately if needed.
https://github.com/llvm/llvm-project/pull/155800
More information about the llvm-commits
mailing list