[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #105850)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 16 12:26:27 PDT 2024
================
@@ -363,11 +363,67 @@ bb:
ret <4 x i16> %ins.3
}
+define <4 x i8> @uadd_sat_v4i8(<4 x i8> %arg0, <4 x i8> %arg1, ptr addrspace(1) %dst) {
+; GCN-LABEL: @uadd_sat_v4i8(
+; GCN-NEXT: bb:
+; GCN-NEXT: [[TMP0:%.*]] = call <4 x i8> @llvm.uadd.sat.v4i8(<4 x i8> [[ARG0:%.*]], <4 x i8> [[ARG1:%.*]])
----------------
jrbyrnes wrote:
This is due to the calling convention, and is not regression of lowering @llvm.uadd.sat.v4i8 vs 4 x @llvm.uadd.sat.i8
The calling convention scalarizes i8 vectors similar to how we pass them across basic blocks. SLP cost model should account for the extracts needed for the vectorized version.
https://github.com/llvm/llvm-project/pull/105850
More information about the llvm-commits
mailing list