[llvm] [AMDGPU] - Generate s_bitreplicate_b64_b32 (PR #69209)
Nicolai Hähnle via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 23 06:21:51 PDT 2023
nhaehnle wrote:
Yes, the intrinsic should be defined by a comment.
My thinking is that since there is no good vector expansion of S_BITREPLICATE, and the whole point of the intrinsic is to generate a faster code sequence when we know it can be used, we should define the intrinsic such that the argument must be uniform. And to be explicit, it is UB (or maybe returns poison, I don't really care, but it shouldn't be fully defined IMO) if the argument is not uniform.
This justifies the use of `convergent` (to prevent the kind of transforms you mentioned) and just putting a readfirstlane in the backend if the argument happens to be in a VGPR.
https://github.com/llvm/llvm-project/pull/69209
More information about the llvm-commits
mailing list