[llvm] [AMDGPU][Attributor] Infer `inreg` attribute in `AMDGPUAttributor` (PR #101609)

Tue Sep 24 09:21:35 PDT 2024

================
@@ -1014,6 +1016,110 @@ struct AAAMDGPUNoAGPR
 
 const char AAAMDGPUNoAGPR::ID = 0;
 
+struct AAAMDGPUUniform
+    : public IRAttribute<Attribute::InReg,
+                         StateWrapper<BooleanState, AbstractAttribute>,
+                         AAAMDGPUUniform> {
+  AAAMDGPUUniform(const IRPosition &IRP, Attributor &A) : IRAttribute(IRP) {}
+
+  /// Create an abstract attribute view for the position \p IRP.
+  static AAAMDGPUUniform &createForPosition(const IRPosition &IRP,
+                                            Attributor &A);
+
+  /// See AbstractAttribute::getName()
+  const std::string getName() const override { return "AAAMDGPUUniform"; }
+
+  const std::string getAsStr(Attributor *A) const override {
+    return getAssumed() ? "inreg" : "non-inreg";
+  }
+
+  void trackStatistics() const override {}
+
+  /// See AbstractAttribute::getIdAddr()
+  const char *getIdAddr() const override { return &ID; }
+
+  /// This function should return true if the type of the \p AA is
+  /// AAAMDGPUUniform
+  static bool classof(const AbstractAttribute *AA) {
+    return (AA->getIdAddr() == &ID);
+  }
+
+  /// Unique ID (due to the unique address)
+  static const char ID;
+};
+
+const char AAAMDGPUUniform::ID = 0;
+
+namespace {
+
+struct AAAMDGPUUniformArgument : public AAAMDGPUUniform {
+  AAAMDGPUUniformArgument(const IRPosition &IRP, Attributor &A)
+      : AAAMDGPUUniform(IRP, A) {}
+
+  void initialize(Attributor &A) override {
+    assert(
+        !AMDGPU::isEntryFunctionCC(getAssociatedFunction()->getCallingConv()));
+    if (getAssociatedArgument()->hasAttribute(Attribute::InReg))
+      indicateOptimisticFixpoint();
+  }
+
+  ChangeStatus updateImpl(Attributor &A) override {
+    unsigned ArgNo = getAssociatedArgument()->getArgNo();
+
+    auto isUniform = [&](AbstractCallSite ACS) -> bool {
+      CallBase *CB = ACS.getInstruction();
+      Value *V = CB->getArgOperandUse(ArgNo);
+      if (isa<Constant>(V))
+        return true;
+      if (auto *I = dyn_cast<Instruction>(V)) {
+        auto *UA = A.getInfoCache()
+                       .getAnalysisResultForFunction<UniformityInfoAnalysis>(
+                           *I->getFunction());
+        return UA && UA->isUniform(I);
+      }
+      if (auto *Arg = dyn_cast<Argument>(V)) {
+        auto *UA = A.getInfoCache()
+                       .getAnalysisResultForFunction<UniformityInfoAnalysis>(
+                           *Arg->getParent());
+        if (UA && UA->isUniform(Arg))
+          return true;
+        // We only rely on isArgPassedInSGPR when the function is terminal,
+        // assuming there is no call edge from a function to an entry function.
+        if (AMDGPU::isEntryFunctionCC(Arg->getParent()->getCallingConv()))
+          return AMDGPU::isArgPassedInSGPR(Arg);
----------------
shiltian wrote:

> This is not conditional on isEntryFunctionCC (e.g. this whole thing depends on inreg arguments in non-entry points also being in SGPR).

It appears that there are some misunderstandings here.

1. This AA relies on inferring its call site arguments.
2. The 1st assumption also implies the AA should not be created on any entry function. The reason is, there is no call sites for entry functions (aka. terminal node on a call graph).
    - Therefore, the special case here is, a non-entry function has a call site in an entry function. Because of the 1st and 2nd assumption, we only rely on `AMDGPU::isArgPassedInSGPR` (or a more proper way when the handling of `i1` argument is merged).
3. For any associated argument of a non-entry function, if it has `inreg` attribute, the AA is marked as in optimal state in `initialize`, thus the `updateImpl` function will not be called at all.

With above being said, the reason that the conditional on `isEntryFunctionCC` is clear: an entry function is a terminal node, and the AA can't be created on entry function, we simply rely on what `UniformityInfoAnalysis` is using for `Argument`: `AMDGPU::isArgPassedInSGPR`. If it is not an entry function, we proceed with creating an AA to infer its all potential call site arguments.

Solely relying on `AMDGPU::isArgPassedInSGPR` has limitation, which is exactly what this AA is trying to improve. If `AMDGPU::isArgPassedInSGPR` returns `false`, there is still a good chance that all its call sites are uniform, thus the `inreg` attribute can be added.

> Also this will need refinement for the i1 case, since soon it will be in SGPR but not uniform

Sure. Will do that once the PR is merged.

https://github.com/llvm/llvm-project/pull/101609