[llvm] [AMDGPU][True16][GlobalISel] Fix v2*16 build_vector patterns (PR #151496)

Thu Jul 31 08:11:45 PDT 2025

================
@@ -3543,15 +3543,29 @@ def : GCNPat <
   (vecTy (UniformBinFrag<build_vector> (Ty undef), (Ty SReg_32:$src1))),
   (S_LSHL_B32 SReg_32:$src1, (i32 16))
 >;
-}
 
 def : GCNPat <
   (vecTy (DivergentBinFrag<build_vector> (Ty undef), (Ty VGPR_32:$src1))),
   (vecTy (V_LSHLREV_B32_e64 (i32 16), VGPR_32:$src1))
 >;
-} // End foreach Ty = ...
 }
 
+let True16Predicate = UseRealTrue16Insts in
+def : GCNPat <
+  (vecTy (DivergentBinFrag<build_vector> (Ty undef), (Ty VGPR_32:$src1))),
+  (REG_SEQUENCE VGPR_32, (Ty (IMPLICIT_DEF)), lo16, (Ty VGPR_32:$src1), hi16)
----------------
mbrkusanin wrote:

I tried this before but it was giving me an error, because it tries creates an EXTRACT_SUBREG from 16bit type to a 16bit type.

Here is a short example:

```
define amdgpu_ps <2 x half> @test(half %arg) {
  %a = fadd half %arg, 4.0
  %b = insertelement <2 x half> poison, half %a, i64 1
  ret <2 x half> %b
}
```

Following SDAG:
```
      t4: f16 = fadd # D:1 t2, ConstantFP:f16<APFloat(17408)>
    t14: v2f16 = BUILD_VECTOR # D:1 undef:f16, t4
```
is transformed into:
```
      t4: f16 = V_ADD_F16_t16_e64 nofpexcept # D:1 TargetConstant:i32<0>, t2, TargetConstant:i32<0>, t3, TargetConstant:i1<0>, TargetConstant:i32<0>, TargetConstant:i32<0>
    t20: f16 = EXTRACT_SUBREG # D:1 t4, TargetConstant:i32<2>
  t14: v2f16 = REG_SEQUENCE # D:1 TargetConstant:i32<33>, IMPLICIT_DEF:f16, TargetConstant:i32<2>, t20, TargetConstant:i32<1>

```

VGPR_32 can be either 32bit or 16bit, basically any of the following:
`list<ValueType> RegTypes = [i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6, i16, f16, bf16];`

Should `(Ty VGPR_32:$src1)` restrict it to Ty, which in this case is one of: i16, f16, bf16?
That what it looks like to me that it is doing.


> we've seen this cause functional errors in the later pass.

Do you have an example of this?

https://github.com/llvm/llvm-project/pull/151496