[PATCH] D136922: [AMDGPU][GISel] Widen s16 SHUFFLE_VECTOR where there are no scalar pack insts

Wed Nov 2 16:56:40 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h:728-732
+      // Note: if we ignore a COPY, we must update DefReg to the copied register
+      // so the loop below can stop correctly.
+      Optional<DefinitionAndSourceRegister> DefSrcReg =
+          getDefSrcRegIgnoringCopies(DefReg, MRI);
+      DefReg = DefSrcReg->Reg;
----------------
D137273 is a standalone fix for this 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:1555
+  if (!ST.hasScalarPackInsts())
+    ShuffleVector.minScalarOrElt(0, S32);
+  ShuffleVector.lower();
----------------
Pierre-vh wrote:
> arsenm wrote:
> > I thought you were moving towards not using shuffle_vector for packed cases too.
> > 
> > Plus the packed cases still need handling for non-16 bit elements 
> I'm not sure I understand. 
> 
> With D135145, the goal is to use shuffle_vector to replace insert_vector_elt in most cases (even packed ones) in a global combine.
> The reason why this patch exists is because D135145 alone would cause a lot of regressions in the codebase for the no-pack-insts case due to the differences in how insert_vector_elt & shuffle_vector are lowered (shuffle implies a lot more bit manipulation stuff). If we widen the shuffle vectors before lowering them for those targets, codegen seems much better.
> 
The only cases we have that look anything like a legal vector_shuffle is for 2 element masks (which is what you removed in c93104073c8a864113a42cde0cceb3e9c21bbf8d). That roughly corresponds with op_sel* modifiers for VOP3P instructions. This case is so narrow that it can also be handled with some shifts and casts. The downside of the shuffle is more general bit combines are less likely to fire on it.

We do not want shuffle vector for "most cases" of insert_vector_elt. The most common case for incoming IR is a sequence of insertelements which should fold to a build_vector. We also would not want an individual insertelement to turn into a shuffle. If we want shuffles to reach the selector, the only cases would be where it would be more convenient to match VO3P modifiers. In that case, we would also want wider shuffles to be split into 2 element pieces (cases where the shuffles remain isolated to two neighboring elements is a narrow narrow case where this could happen)

 At the moment I don't see any reason to produce shuffles differently than what DAGCombiner does today 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136922/new/

https://reviews.llvm.org/D136922