[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #95840)

Mon Jul 1 16:24:03 PDT 2024

================
@@ -306,6 +306,18 @@ bool GCNTTIImpl::hasBranchDivergence(const Function *F) const {
   return !F || !ST->isSingleLaneExecution(*F);
 }
 
+unsigned GCNTTIImpl::getNumberOfParts(Type *Tp) const {
+  if (FixedVectorType *VTy = dyn_cast<FixedVectorType>(Tp)) {
+    if (DL.getTypeSizeInBits(VTy->getElementType()) == 8) {
+      unsigned ElCount = VTy->getElementCount().getFixedValue();
+      return ElCount / 4;
+    }
+  }
+
+  std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Tp);
+  return LT.first.isValid() ? *LT.first.getValue() : 0;
----------------
jrbyrnes wrote:

We are overriding the default (which just returns the legalization costs) because for certain instruction (e.g. v_perm) a v4i8 corresponds with 1 part. This enables SLP to consider i8 vectorization, and for most instructions the cost of vectorization is still the legalization cost.

Re generic: AFAICT the users are just vectorizers, which is what we want.

https://github.com/llvm/llvm-project/pull/95840