[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #105850)

Thu Oct 10 12:27:25 PDT 2024

================
@@ -306,6 +306,23 @@ bool GCNTTIImpl::hasBranchDivergence(const Function *F) const {
   return !F || !ST->isSingleLaneExecution(*F);
 }
 
+unsigned GCNTTIImpl::getNumberOfParts(Type *Tp) {
+  // For certain 8 bit ops, we can pack a v4i8 into a single part
+  // (e.g. v4i8 shufflevectors -> v_perm v4i8, v4i8). Thus, we
----------------
arsenm wrote:

For certain ops, but this is much broader. It's only a few data flow ops where this might help, but this is making the costs of other operations too optimistic. We should avoid vectorizing non-dataflow operations.

https://github.com/llvm/llvm-project/pull/105850