[PATCH] D124734: [AMDGPU] Fix scalar_to_vector for v8i16/v8f16
Mahesha S via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 2 11:27:54 PDT 2022
hsmhsm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:2710
+ (v8i16 (scalar_to_vector i16:$src0)),
+ (INSERT_SUBREG (IMPLICIT_DEF), $src0, sub0)
+>;
----------------
rampitec wrote:
> arsenm wrote:
> > I don’t think these should be legal. We don’t naturally have 8 X 16 operations. A lowering that splits the vector would avoid introducing the wider registers and may combine better
> We actually do have these operands:
> ```
> v_smfmac_f32_16x16x32_f16
> v_smfmac_f32_32x32x16_f16
> v_smfmac_f32_16x16x32_bf16
> v_smfmac_f32_32x32x16_bf16
> ```
And, even if we think that we better handle it by splitting the vector, then we can just materialize scalar_to_vector as build_vector since build_vector already has custom lowering for v8i16/v8f16 by splitting these types.
I experimented it, I see better ISEL output in this case, and also final ISA looks good - one shift and one pack operation is got eliminated. I will update the patch with this change. Let's take a look at it and discuss it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124734/new/
https://reviews.llvm.org/D124734
More information about the llvm-commits
mailing list