[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #105850)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 10 12:27:25 PDT 2024
================
@@ -306,6 +306,23 @@ bool GCNTTIImpl::hasBranchDivergence(const Function *F) const {
return !F || !ST->isSingleLaneExecution(*F);
}
+unsigned GCNTTIImpl::getNumberOfParts(Type *Tp) {
+ // For certain 8 bit ops, we can pack a v4i8 into a single part
+ // (e.g. v4i8 shufflevectors -> v_perm v4i8, v4i8). Thus, we
----------------
arsenm wrote:
For certain ops, but this is much broader. It's only a few data flow ops where this might help, but this is making the costs of other operations too optimistic. We should avoid vectorizing non-dataflow operations.
https://github.com/llvm/llvm-project/pull/105850
More information about the llvm-commits
mailing list