[PATCH] D124734: [AMDGPU] Fix scalar_to_vector for v8i16/v8f16

Mon May 2 11:36:20 PDT 2022

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:2710
+  (v8i16 (scalar_to_vector i16:$src0)),
+  (INSERT_SUBREG (IMPLICIT_DEF), $src0, sub0)
+>;
----------------
hsmhsm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > I don’t think these should be legal. We don’t naturally have 8 X 16 operations. A lowering that splits the vector would avoid introducing the wider registers and may combine better 
> > We actually do have these operands:
> > ```
> > v_smfmac_f32_16x16x32_f16
> > v_smfmac_f32_32x32x16_f16
> > v_smfmac_f32_16x16x32_bf16
> > v_smfmac_f32_32x32x16_bf16
> > ```
> And, even if we think that we better handle it by splitting the vector, then we can just materialize scalar_to_vector as build_vector since build_vector already has custom lowering for v8i16/v8f16 by splitting these types.
> 
> I experimented it, I see better ISEL output in this case, and also final ISA looks good - one shift and one pack operation is got eliminated. I will update the patch with this change. Let's take a look at it and discuss it.
Please do. Offhand the patch looks good to me.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124734/new/

https://reviews.llvm.org/D124734