[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Thu Jul 3 06:01:19 PDT 2025

AlexVlx wrote:

> @efriedma-quic was kind enough to have a call where we discussed this a bit more. I'll update tomorrow with a potential way forward, for the group's consideration.

Following up, here's a possible approach to making progress, broken down in phases, (@efriedma-quic can correct me if I am misrepresenting any of these):

1. Have what is proposed here as an initial step, with the addition that we issue warnings on unguarded uses of builtins / ASM (similar to what `__builtin_available` / `@available` do), and we clean-up non-extern functions that become unreachable as a consequence of predicate expansion (i.e. `foo` can only be called from within this module, and it was only being called from a predicate guarded block, which was removed);
2. Add attribute based checking for predicate guarded areas:
     - Functions can be annotated either with the existing `target` attribute or with a new `target_can_invoke` (name up for bike-shedding) attribute;
     - Within a predicate guarded scope, if we encounter contradictions, e.g. we call a `target("gfx9000")` function, or a `target_can_invoke(builtin_only_on_gfx9000)`, within a `__builtin_amdgcn_processor_is("gfx8999")`, that is an error
     - This should reward users that go through the effort of annotating their functions, making it much harder to write bugs
     - I'm not entirely sure how to do this well yet (nested guarded regions, where to track the currently active guard etc.), and it probably needs a bit more design, hence why it's a different phase
     - It is a pre-requisite for any attempt at making these general, rather than target specific
3. In relation with generalisation, if we go in that direction (i.e. other targets are interested / we think there's merit into hoisting these into generic Clang builtins), we will have to look at whether or not we want a different IR representation (possibly / probably along the lines of what has been discussed here), for cases where a target must run some potentially disruptive optimisations before and cannot just do the expansion right after Clang.

https://github.com/llvm/llvm-project/pull/134016