[PATCH] D65088: [AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic
Sebastian Neubauer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 11 03:47:25 PDT 2020
Flakebi added a comment.
In D65088#1916416 <https://reviews.llvm.org/D65088#1916416>, @foad wrote:
> In D65088#1916351 <https://reviews.llvm.org/D65088#1916351>, @Flakebi wrote:
>
> > The code generation for test2 is currently not optimal:
> >
> > %trunc = trunc i32 %x to i1
> > %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
> >
> >
> > generates
> >
> > v_and_b32_e32 v0, 1, v0
> > v_cmp_eq_u32_e32 vcc, 1, v0
> > s_and_b64 s[4:5], vcc, exec
> >
> >
> > where the first compare stems from the truncate.
>
>
> I'm confused by this. What is the optimal code generation?
The more optimal version would be to merge the compare and `s_and exec` like you said in your comment?
v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 s[4:5], 1, v0
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D65088/new/
https://reviews.llvm.org/D65088
More information about the llvm-commits
mailing list