[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)
Alex Voicu via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 3 11:04:10 PDT 2025
AlexVlx wrote:
> > as a must have, it allows featureful AMDGCN flavoured SPIR-V to be produced, where target specific capability is guarded and chosen or discarded when finalising compilation for a concrete target.
>
> I understand the reasoning behind providing such mechanisms to guard hardware specific code for targets, that are being JITed, but I'm not sure about, how would it work in SPIR-V case. Like it's is described now you are able to easily remove the instructions within the guarded block. But SPIR-V also provides `OpExtension` and `OpCapability` instructions that specify, which features are used in the module and are placed on the top of it, so the runtime is free to discard any module with unknown/unsupported capability. Will you also provide a tool to cut those capabilities along the instructions?
>
Thank you for the feedback! I might not be getting the question right (case in which I apologise in advance!), but I think that for "vanilla" SPIR-V i.e. not vendor flavoured one, where one strictly has to deal with Extensions / non-core capabilities, we probably would have the following situation:
- The `processor_is` query is not particularly useful, unless we'd start defining pseudo-processors i.e. collections of features, which IMHO would be bad / subvert one of the main benefits of SPIR-V;
- The `is_invocable` query, however, is probably quite useful even there, assuming we'd start somewhat more aggressively introducing target specific intrinsics which map to e.g. `Op*`s and do the reasonable thing of bubbling them into clang via builtins;
- please note that an underlying assumption for our use here is that SPIR-V gets reverse translated into LLVM IR, I assume you are interested in the case where that assumption does not hold / the transforms would only apply to the SPIR-V module - with the caveat that this is somewhat handwavium powered, I will suggest that the benefit in that case is that if a module containing guarded functionality is encountered by a RT that does not support said functionality, instead of fully discarding the module it becomes possible to just remove the offending bits, in a way similar to what is done here, and still successfully load it.
> My review is not gating here, but just for my curiosity (and to have an opinion about: _"In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics."_ ):
>
> 1. Is it allowed to use the builtin along a runtime-known boolean flag? (probably it's also somewhat related to Erich's comments)
No it is not, these have to be constant foldable (part of the motivation for the awkward design that is giving people pause is to unambiguously constrain them to cases where they will be constant foldable, strictly based on their value, without trying to explain that wrinkle to users). Adding in arbitrary run time values / performing arbitrary ops on these means that we might inadvertently allow through invalid code. If the question is more about something along the lines of "at run time a value is passed as an input to the finalisation process i.e. it is constant" in theory this could be made to work, but the interface isn't quite there and it remains dangerous. For completeness, I will note that in some sense a run time value is being passed for the SPIR-V case, as the target is only known at run time, although this is not user controlled.
> 2. Does this builtin act like a barrier for optimization passes? if so, is a load from llvm.amdgcn.is GV considered to be a barrier, or something else?
In theory, only by virtue of it being a load from an externally initialised global variable, with all the implications of that. However, the intention is to run the expansion pass as early as possible, immediately after Clang CodeGen, so that subsequent passes don't have to deal with these being present at all (they are meant to be rather ephemeral, and would only ever come into being when targeting `amdgcnspirv`).
> 3. Is it envisioned, that optnone (LLVM/SPIR-V) has no effect on the pass/tool, that removes target-dependent code?
Yes, this is always enabled for AMDGPU and has to work with -O0 / `optnone` which is another reason for going for tight constraints. For the currently allowed uses, the transform is fairly straightforward and is not too disruptive to the module. Furthermore, it does not have to rely on e.g. DCE, inlining or constprop being run, which at O0 wouldn't happen. Conversely, were these to be allowed e.g. as arguments to a function call, then we'd have to start looking through / inlining calls where predicates are passed. Which is doable, and we have implemented, but is intrusive, non-trivial and possibly quite expensive (essentially we'd need to clone the callee to avoid messing up user provided code, simplify the callee to prevent spurious failures around e.g. copying arguments into local allocas, then inline the simplified clone, then tidy up etc.).
https://github.com/llvm/llvm-project/pull/134016
More information about the llvm-commits
mailing list