[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)
John McCall via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 30 09:25:59 PDT 2025
rjmccall wrote:
> > On a different point: I don't think this builtin is actually semantically different from `__builtin_cpu_is`. As long as we're not treating it as `constexpr`, the fact that it's lowered by the compiler and doesn't need a runtime check is just a happy property of GPU targeting rather than a fundamental difference. You could certainly imagine targets that _do_ simply do this with a runtime switch. And the behavior of allowing additional builtin to be used within the guarded block seems like a nice feature that other targets would probably like to take advantage of.
> > We could allow `__builtin_processor_is` as an alternative name for that builtin if folks feel weird about having "cpu" in the name for a GPU target.
>
> The `processor_is` interface initially did not exist, but rather `__builtin_cpu_is` gained the ability to be statically resolved in the FE in certain cases / generate no run time code. There was strong opposition from some of my colleagues (some of which are on this thread) claiming that the semantics of `__builtin_cpu_is` mandate the existence of a run time check. The "cpu" bit wasn't really a problem:)
>
> If you / other Clang owners are happy with extending `__builtin_cpu_is`, personally I would prefer that since I believe that it can be beneficial for targets other than ours / GPUs in general. For example, even for x86, there's a difference between e.g. `x86_64-v2` and `znver5`, which could be resolved in the FE and remove the need to do a cpuid check at run time, and then go via a function call rather than direct inline code.
Right, I don't see any semantic reason why `__builtin_cpu_is` or `__builtin_cpu_supports` shouldn't be resolved statically if we have that information on hand. `-mcpu` / `-march` are not usually sufficient for folding `__builtin_cpu_is`, since those attributes just specify a minimum architecture and the builtin is doing an exact check, but that's emergent and shouldn't be taken as an inherent limitation of the builtin.
https://github.com/llvm/llvm-project/pull/134016
More information about the llvm-commits
mailing list