[llvm] [AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (PR #116953)

Mon Sep 15 03:10:33 PDT 2025

================
@@ -0,0 +1,176 @@
+//===-- AMDGPUUniformIntrinsicCombine.cpp ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// This pass simplifies certain intrinsic calls when the arguments are uniform.
+/// It's true that this pass has transforms that can lead to a situation where
+/// some instruction whose operand was previously recognized as statically
+/// uniform is later on no longer recognized as statically uniform. However, the
+/// semantics of how programs execute don't (and must not, for this precise
+/// reason[0]) care about static uniformity, they only ever care about dynamic
+/// uniformity. And every instruction that's downstream and cares about dynamic
+/// uniformity must be convergent (and isel will introduce v_readfirstlane for
+/// them if their operands can't be proven statically uniform).
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/TargetLibraryInfo.h"
+#include "llvm/Analysis/UniformityAnalysis.h"
+#include "llvm/CodeGen/TargetPassConfig.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/InstVisitor.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Target/TargetMachine.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+#define DEBUG_TYPE "amdgpu-uniform-intrinsic-combine"
+
+using namespace llvm;
+using namespace llvm::AMDGPU;
+using namespace llvm::PatternMatch;
+
+/// Tracks uniformity of newly created instructions.
+/// Wraps a ValueMap so we can enforce consistent mark/erase usage.
+struct UniformityTracker : DenseMap<const Value *, bool> {
+  /// Record that V has known uniformity.
+  void mark(Value *V, bool IsUniform) { (*this)[V] = IsUniform; }
+
+  /// Erase V from the map if it is an instruction with no uses anymore.
+  void eraseIfDead(Value *V) {
+    if (auto *I = dyn_cast<Instruction>(V); I && I->use_empty())
+      this->erase(V);
+  }
+};
+
+/// Wrapper for querying uniformity info that first checks new instructions.
+static bool isDivergentUseWithNew(const Use &U, const UniformityInfo &UI,
----------------
ssahasra wrote:

Actually, why not just merge this into UniformityInfo itself? Modify that class to use a ValueMap instead of the current DenseMap to keep track of divergent values, and provide an API to add new divergent values. In this pass, no new value will be added because they are all uniform. The general advice to any transform is that whenever a new value is created, if the uniformity cannot be proven, then add them to the divergent values map. In fact anything that gets RAUW'ed will automatically enter the map, which is safe by default.

https://github.com/llvm/llvm-project/pull/116953