[llvm] [AMDGPU] - Generate s_bitreplicate_b64_b32 (PR #69209)
Jessica Del via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 16 07:41:58 PDT 2023
OutOfCache wrote:
> > Support VGPR arguments by inserting a v_readfirstlane.
>
> Unfortunately this is not correct behaviour. You would need to either generate a waterfall loop, or find some other way to implement the bitreplicate operation using VALU instructions.
>
> The problem is that, even if the input program only uses bitreplicate on uniform arguments, the compiler could theoretically transform input like this:
>
> ```
> result = divergent_condition ? bitreplicate(uniform_input) : bitreplicate(another_uniform_input);
> ```
>
> into this:
>
> ```
> result = bitreplicate(divergent_condition ? uniform_input : another_uniform_input);
> ```
>
> so the backend needs to be prepared to handle divergent arguments.
I see, that makes sense. A waterfall loop seems reasonable in that case. Thanks for pointing that out!
https://github.com/llvm/llvm-project/pull/69209
More information about the llvm-commits
mailing list