[llvm] [AMDGPU] - Generate s_bitreplicate_b64_b32 (PR #69209)
    Jessica Del via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Mon Oct 16 07:41:58 PDT 2023
    
    
  
OutOfCache wrote:
> > Support VGPR arguments by inserting a v_readfirstlane.
> 
> Unfortunately this is not correct behaviour. You would need to either generate a waterfall loop, or find some other way to implement the bitreplicate operation using VALU instructions.
> 
> The problem is that, even if the input program only uses bitreplicate on uniform arguments, the compiler could theoretically transform input like this:
> 
> ```
>   result = divergent_condition ? bitreplicate(uniform_input) : bitreplicate(another_uniform_input);
> ```
> 
> into this:
> 
> ```
>   result = bitreplicate(divergent_condition ? uniform_input : another_uniform_input);
> ```
> 
> so the backend needs to be prepared to handle divergent arguments.
I see, that makes sense. A waterfall loop seems reasonable in that case. Thanks for pointing that out!
https://github.com/llvm/llvm-project/pull/69209
    
    
More information about the llvm-commits
mailing list