[PATCH] D65088: [AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic

Wed Mar 11 02:19:28 PDT 2020

foad added a comment.

In D65088#1916351 <https://reviews.llvm.org/D65088#1916351>, @Flakebi wrote:

> The code generation for test2 is currently not optimal:
>
>   %trunc = trunc i32 %x to i1
>   %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
>   
>
> generates
>
>   v_and_b32_e32 v0, 1, v0
>   v_cmp_eq_u32_e32 vcc, 1, v0
>   s_and_b64 s[4:5], vcc, exec
>   
>
> where the first compare stems from the truncate.

I'm confused by this. What is the optimal code generation?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65088/new/

https://reviews.llvm.org/D65088