[llvm] [AMDGPU] Make v2bf16 BUILD_VECTOR legal (PR #92022)

Stanislav Mekhanoshin via llvm-commits llvm-commits at lists.llvm.org
Mon May 13 13:48:40 PDT 2024


================
@@ -55,15 +55,17 @@ define amdgpu_ps float @v_test_cvt_v2f32_v2bf16_s(<2 x float> inreg %src) {
 ; GCN-NEXT:    s_add_i32 s5, s2, 0x7fff
 ; GCN-NEXT:    v_cmp_u_f32_e64 s[2:3], s1, s1
 ; GCN-NEXT:    s_and_b64 s[2:3], s[2:3], exec
-; GCN-NEXT:    s_cselect_b32 s2, s4, s5
+; GCN-NEXT:    s_cselect_b32 s1, s4, s5
----------------
rampitec wrote:

That was before selection:
```
          t45: i32 = srl t44, Constant:i32<16>
        t46: i16 = truncate t45
          t34: i32 = srl t33, Constant:i32<16>
        t35: i16 = truncate t34
      t19: v2i16 = BUILD_VECTOR t46, t35
```
Now since it is legal:
```
            t41: i32 = srl t40, Constant:i32<16>
          t42: i16 = truncate t41
        t43: bf16 = bitcast t42
            t30: i32 = srl t29, Constant:i32<16>
          t31: i16 = truncate t30
        t32: bf16 = bitcast t31
      t16: v2bf16 = BUILD_VECTOR t43, t32
```
These two bitcasts to bf16 prevent this pattern from matching:
```
def : GCNPat <
  (v2i16 (UniformBinFrag<build_vector> (i16 (trunc (srl_oneuse SReg_32:$src0, (i32 16)))),
                       (i16 (trunc (srl_oneuse SReg_32:$src1, (i32 16)))))),
  (S_PACK_HH_B32_B16 SReg_32:$src0, SReg_32:$src1)
>;
```
I am not sure it is worth creating more patterns here as the main idea is have it working and producing some legal code.

https://github.com/llvm/llvm-project/pull/92022


More information about the llvm-commits mailing list