[llvm] [AMDGPU] - Generate s_bitreplicate_b64_b32 (PR #69209)

Jessica Del via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 16 07:41:58 PDT 2023


OutOfCache wrote:

> > Support VGPR arguments by inserting a v_readfirstlane.
> 
> Unfortunately this is not correct behaviour. You would need to either generate a waterfall loop, or find some other way to implement the bitreplicate operation using VALU instructions.
> 
> The problem is that, even if the input program only uses bitreplicate on uniform arguments, the compiler could theoretically transform input like this:
> 
> ```
>   result = divergent_condition ? bitreplicate(uniform_input) : bitreplicate(another_uniform_input);
> ```
> 
> into this:
> 
> ```
>   result = bitreplicate(divergent_condition ? uniform_input : another_uniform_input);
> ```
> 
> so the backend needs to be prepared to handle divergent arguments.

I see, that makes sense. A waterfall loop seems reasonable in that case. Thanks for pointing that out!

https://github.com/llvm/llvm-project/pull/69209


More information about the llvm-commits mailing list