[PATCH] D65088: [AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic

Wed Mar 11 03:47:25 PDT 2020

Flakebi added a comment.

In D65088#1916416 <https://reviews.llvm.org/D65088#1916416>, @foad wrote:

> In D65088#1916351 <https://reviews.llvm.org/D65088#1916351>, @Flakebi wrote:
>
> > The code generation for test2 is currently not optimal:
> >
> >   %trunc = trunc i32 %x to i1
> >   %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
> >   
> >
> > generates
> >
> >   v_and_b32_e32 v0, 1, v0
> >   v_cmp_eq_u32_e32 vcc, 1, v0
> >   s_and_b64 s[4:5], vcc, exec
> >   
> >
> > where the first compare stems from the truncate.
>
>
> I'm confused by this. What is the optimal code generation?

The more optimal version would be to merge the compare and `s_and exec` like you said in your comment?

  v_and_b32_e32 v0, 1, v0
  v_cmp_eq_u32_e32 s[4:5], 1, v0

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65088/new/

https://reviews.llvm.org/D65088