[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Tue Jun 17 13:42:08 PDT 2025

================
@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
     QualType FromType = OnlyArg->getType();
+    // __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+    // type, although this is almost always an error and we advise against it
----------------
AlexVlx wrote:

Very good questions. Optimisation matters because of e.g. inlining - it would not really do much much in the case I outlined, except if you had `foo` inlined into `bar`, then statically resolved the predicate value and picked just that. This is doable, but not guaranteed, and wouldn't happen at O0, which induces a dependence we do not want. There is a dedicated pass we are adding which does basic const folding of a predicate, so if that doesn't work (because of wading into one of these intricacies), the user gets a fairly obnoxious error - we do try to tell them where there's a problem and what happened, but it's still a bit opaque.

But let's step back a bit to the first case i.e. why would stashing these in a vector / container be dangerous (but let's generalise to squirrelling in general - I picked a fairly banal example, but substitute, say, an `atomic<bool>` in there). Let's recall what these are meant for: controlling the lowering of target specific sequences. Now, irrespective if the target is concrete (`gfxSMTH`), i.e. we expand the predicate value in the FE, or abstract (`amdgcnspirv`), i.e. we only know the concrete target at run time, we absolutely MUST remove sequences that are not viable on a particular target, if it's not present. I.e. if we're on `gfx900`, we have to be able to statically and unambiguously remove sequences that are absolutely not viable there. Otherwise, what obtains is at best a very opaque BE error or, more dangerously, an ISA sequence being executed to unknowable consequences. If we just allow users to for arbitrarily complex squirrelling / accessing schemes, we lose that ability:

- even if we are on a concrete target, the FE cannot drop expressions with potential side-effects even if by some stroke of luck an arbitrarily complex stash + access chain were otherwise viable for full evaluation;
- on an abstract target the dedicated ME pass isn't going to look through calls, and if we tried to look through calls then we run into having to consider things like alloca removal, CSE etc. -> essentially we have to run pretty intrusive optimisation to even have hope, which messes up O0.

Overall, the intention is to make things difficult to misuse - these are definitely not just cute booleans, they have teeth and can bite! So mandating needing an explicit cast / making the type irregular gives an easy tell / code smell when some gnarly bug unavoidably shows up - "oh, you seem to have cast this here, what was the intention?".

https://github.com/llvm/llvm-project/pull/134016