[llvm] [AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (PR #105812)

Fri Aug 23 05:00:08 PDT 2024

================
@@ -440,6 +440,22 @@ static bool canContractSqrtToRsq(const FPMathOperator *SqrtOp) {
          SqrtOp->getType()->isHalfTy();
 }
 
+/// Return true if we can easily prove that use U is uniform.
+static bool isTriviallyUniform(const Use &U) {
+  Value *V = U.get();
+  if (isa<Constant>(V))
+    return true;
+  if (auto *I = dyn_cast<Instruction>(V)) {
+    // If I and U are in different blocks then there is a possibility of
+    // temporal divergence.
+    if (I->getParent() != cast<Instruction>(U.getUser())->getParent())
+      return false;
+    if (const auto *II = dyn_cast<IntrinsicInst>(I))
----------------
ssahasra wrote:

Would it be slightly faster if we checked this first, before checking whether it's the same block? I mean, return false if it is not a uniform intrinsic? If the dyn_cast to IntrinsicInst were the outer condition, then would also really bring out the fact that we are doing a trivial check on uniform intrinsics only, and not instructions in general.

https://github.com/llvm/llvm-project/pull/105812