[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #95840)

Fri Aug 23 08:54:02 PDT 2024

================
@@ -1134,14 +1153,15 @@ InstructionCost GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 
   Kind = improveShuffleKindFromMask(Kind, Mask, VT, Index, SubTp);
 
-  // Larger vector widths may require additional instructions, but are
-  // typically cheaper than scalarized versions.
-  unsigned NumVectorElts = cast<FixedVectorType>(VT)->getNumElements();
+  unsigned ScalarSize = DL.getTypeSizeInBits(VT->getElementType());
----------------
jrbyrnes wrote:

Separate the operation / type cost changes into different PRs? That's actually what I've tried to do -- it's a PR stacking problem.

The relationship is mapped in the description, but basically this PR is meant for the changes to shufflecost and https://github.com/llvm/llvm-project/pull/91016 is meant for the changes required to enable SLP vectorization for i8s. The dependency is needed one way or the other to add the lit changes, and I decided to make https://github.com/llvm/llvm-project/pull/91016 the base as there is another PR in flight which depends on that as well.

In short, this PR is meant to address the shuffelcost changes and will be landed atomically with https://github.com/llvm/llvm-project/pull/91016  (if at all)

https://github.com/llvm/llvm-project/pull/95840