[PATCH] D124734: [AMDGPU] Fix scalar_to_vector for v8i16/v8f16

Mon May 2 11:27:54 PDT 2022

hsmhsm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:2710
+  (v8i16 (scalar_to_vector i16:$src0)),
+  (INSERT_SUBREG (IMPLICIT_DEF), $src0, sub0)
+>;
----------------
rampitec wrote:
> arsenm wrote:
> > I don’t think these should be legal. We don’t naturally have 8 X 16 operations. A lowering that splits the vector would avoid introducing the wider registers and may combine better 
> We actually do have these operands:
> ```
> v_smfmac_f32_16x16x32_f16
> v_smfmac_f32_32x32x16_f16
> v_smfmac_f32_16x16x32_bf16
> v_smfmac_f32_32x32x16_bf16
> ```
And, even if we think that we better handle it by splitting the vector, then we can just materialize scalar_to_vector as build_vector since build_vector already has custom lowering for v8i16/v8f16 by splitting these types.

I experimented it, I see better ISEL output in this case, and also final ISA looks good - one shift and one pack operation is got eliminated. I will update the patch with this change. Let's take a look at it and discuss it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124734/new/

https://reviews.llvm.org/D124734