[llvm] [AMDGPU] Make v2bf16 BUILD_VECTOR legal (PR #92022)
Stanislav Mekhanoshin via llvm-commits
llvm-commits at lists.llvm.org
Mon May 13 13:48:40 PDT 2024
================
@@ -55,15 +55,17 @@ define amdgpu_ps float @v_test_cvt_v2f32_v2bf16_s(<2 x float> inreg %src) {
; GCN-NEXT: s_add_i32 s5, s2, 0x7fff
; GCN-NEXT: v_cmp_u_f32_e64 s[2:3], s1, s1
; GCN-NEXT: s_and_b64 s[2:3], s[2:3], exec
-; GCN-NEXT: s_cselect_b32 s2, s4, s5
+; GCN-NEXT: s_cselect_b32 s1, s4, s5
----------------
rampitec wrote:
That was before selection:
```
t45: i32 = srl t44, Constant:i32<16>
t46: i16 = truncate t45
t34: i32 = srl t33, Constant:i32<16>
t35: i16 = truncate t34
t19: v2i16 = BUILD_VECTOR t46, t35
```
Now since it is legal:
```
t41: i32 = srl t40, Constant:i32<16>
t42: i16 = truncate t41
t43: bf16 = bitcast t42
t30: i32 = srl t29, Constant:i32<16>
t31: i16 = truncate t30
t32: bf16 = bitcast t31
t16: v2bf16 = BUILD_VECTOR t43, t32
```
These two bitcasts to bf16 prevent this pattern from matching:
```
def : GCNPat <
(v2i16 (UniformBinFrag<build_vector> (i16 (trunc (srl_oneuse SReg_32:$src0, (i32 16)))),
(i16 (trunc (srl_oneuse SReg_32:$src1, (i32 16)))))),
(S_PACK_HH_B32_B16 SReg_32:$src0, SReg_32:$src1)
>;
```
I am not sure it is worth creating more patterns here as the main idea is have it working and producing some legal code.
https://github.com/llvm/llvm-project/pull/92022
More information about the llvm-commits
mailing list