[llvm] [AMDGPU] - Generate s_bitreplicate_b64_b32 (PR #69209)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 23 07:54:04 PDT 2023
jayfoad wrote:
> > ```
> > %res1 = bitreplicate(%val1)
> > %res2 = bitreplicate(%val2)
> > %res = select %cond, %res1, %res2
> > ```
>
> When I write a test like that, I get the following assembly output:
>
> ```
> s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
> s_bitreplicate_b64_b32 s[0:1], 0x85fe3a92 ; %res1
> v_dual_mov_b32 v1, s0 :: v_dual_and_b32 v0, 1, v0
> v_mov_b32_e32 v2, s1
> s_bitreplicate_b64_b32 s[2:3], 0x3a9285fe ; %res2
> v_cmp_eq_u32_e32 vcc_lo, 1, v0
> v_cndmask_b32_e32 v0, s2, v1, vcc_lo
> v_cndmask_b32_e32 v1, s3, v2, vcc_lo
> s_setpc_b64 s[30:31]
> ```
>
> So, in the end `v1` has the value of `%res`, right? Is that not correct?
Yes, that is fine. But I'm saying that a generic IR optimization would be _allowed_ to transform that IR into this IR (even if bitreplicate is marked as convergent):
```
%val = select %cond, %val1, %val2
%res = bitreplicate(%val)
```
https://github.com/llvm/llvm-project/pull/69209
More information about the llvm-commits
mailing list