[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)
Alex Voicu via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 24 09:17:16 PDT 2025
================
@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation pipeline
+// and the BE operate with concrete target data.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include <string>
+#include <utility>
+
+using namespace llvm;
+
+namespace {
+template <typename C> void collectUsers(Value *V, C &Container) {
+ assert(V && "Must pass an existing Value!");
+
+ for (auto &&U : V->users())
+ if (auto *I = dyn_cast<Instruction>(U))
+ Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+ const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+ const auto Offset =
+ IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+ auto PV = P->getName().substr(Offset).str();
+ if (IsFeature) {
+ auto Dx = PV.find(',');
+ while (Dx != std::string::npos) {
+ PV.insert(++Dx, {'+'});
+
+ Dx = PV.find(',', Dx);
+ }
+ PV.insert(PV.cbegin(), '+');
+ }
+
+ auto *PTy = P->getValueType();
+ P->setLinkage(GlobalValue::PrivateLinkage);
+ P->setExternallyInitialized(false);
+
+ if (IsFeature)
+ P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+ else
+ P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair<PreservedAnalyses, bool>
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+ std::string W;
+ raw_string_ostream OS(W);
+
+ OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+ Caller->getContext().diagnose(
+ DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+ return {PreservedAnalyses::none(), false};
+}
+
+std::pair<PreservedAnalyses, bool> handlePredicate(const GCNSubtarget &ST,
+ GlobalVariable *P) {
----------------
AlexVlx wrote:
> So to clarify, optimizations will never be applied during the compilation to amdgcnspirv? If that's the case, I guess it's not likely that IR will be transformed in problematic ways.
>
Yes, this is the intention, it is still ongoing work - empirically we are not running into any of the potential issues you brought up, which is why I went ahead with upstreaming this part which is fairly important for library work (hard to author high-performance generic libs without this sort of mechanism). By the end of this year we should end up generating SPIRV from Clang's LLVMIR output, with no optimisations applied.
> It did occur to me that a way to guarantee that the folding works is by using a callbr intrinsic, something like this:
>
> ```llvm
> callbr void @llvm.amdgcn.processor.is(metadata "gfx803") to label %unsupported [label %supported]
> ```
>
> This would make the check fundamentally inseparable from the control flow.
>
> But I guess you'd have trouble round-tripping that via SPIRV...
Ah, I actually hadn't thought of that but having had a glance yes, it's difficult to round trip. Something to consider in the future and if / when we try to make this generic rather than target specific, if there is interest.
https://github.com/llvm/llvm-project/pull/134016
More information about the cfe-commits
mailing list