[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #105850)

Fri Oct 18 16:26:34 PDT 2024

================
@@ -306,6 +306,23 @@ bool GCNTTIImpl::hasBranchDivergence(const Function *F) const {
   return !F || !ST->isSingleLaneExecution(*F);
 }
 
+unsigned GCNTTIImpl::getNumberOfParts(Type *Tp) {
+  // For certain 8 bit ops, we can pack a v4i8 into a single part
+  // (e.g. v4i8 shufflevectors -> v_perm v4i8, v4i8). Thus, we
----------------
jrbyrnes wrote:

I've added https://github.com/llvm/llvm-project/pull/113002 to vectorize i8s with a more refined cost model (instead of using shuffle vectorization to capture the usecase we are interested in)

https://github.com/llvm/llvm-project/pull/105850