[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Tue Jun 24 07:30:49 PDT 2025

================
@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation pipeline
+// and the BE operate with concrete target data.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include <string>
+#include <utility>
+
+using namespace llvm;
+
+namespace {
+template <typename C> void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+    if (auto *I = dyn_cast<Instruction>(U))
+      Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+      IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+    auto Dx = PV.find(',');
+    while (Dx != std::string::npos) {
+      PV.insert(++Dx, {'+'});
+
+      Dx = PV.find(',', Dx);
+    }
+    PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+    P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+    P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair<PreservedAnalyses, bool>
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+     << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+      DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair<PreservedAnalyses, bool> handlePredicate(const GCNSubtarget &ST,
+                                                   GlobalVariable *P) {
----------------
AlexVlx wrote:

Oh, this is a good question, it's probably gotten lost in the lengthy conversation. We have two cases, let me try to clarify:

1. We are targeting a concrete `gfx###` target, for which the features and capabilities are fully known at compile time / we know what we are lowering for -> the predicates get expanded and resolved in the FE, they never reach codegen / get emitted in IR;
2. We are targeting `amdgcnspirv`, which is abstract and for which the actual concrete target is only known at run time i.e. there's a lack of information / temporal decoupling:
     - the predicates allow one to write code that adapts to the capabilities of the actual target that the code will execute on;
     - we only know the target once we resume compilation for the concrete target, hence the need to emit them in IR, and then expand.

The ultimate state of affairs (not there yet due to historical issues / ongoing work) is that for the 2nd case the IR we generate SPIRV from is directly the pristine Clang output (+transforms needed for SPIRV, which do not impact these), so when we resume compilation at run time, it's on un-optimised FE-output IR. Furthermore, the expansion pass runs unconditionally, and is independent from optimisation level (which also implies it needs to be better about cleaning after itself, which I still owe an answer for). Hopefully that helps / makes some degree of sense?


https://github.com/llvm/llvm-project/pull/134016