[llvm] [SLP] Vectorize non-power-of-2 ops. (PR #77790)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 19 11:52:34 PST 2024
================
@@ -668,21 +702,32 @@ entry:
}
define void @add1fn(ptr noalias %dst, ptr noalias %src) {
-; CHECK-LABEL: @add1fn(
-; CHECK-NEXT: entry:
-; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, ptr [[SRC:%.*]], i64 1
-; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr [[SRC]], align 4
-; CHECK-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, ptr [[DST:%.*]], i64 1
-; CHECK-NEXT: store float [[TMP0]], ptr [[DST]], align 4
-; CHECK-NEXT: [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, ptr [[SRC]], i64 3
-; CHECK-NEXT: [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, ptr [[DST]], i64 3
-; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[INCDEC_PTR]], align 4
-; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP2]], <float 1.000000e+00, float 2.000000e+00>
-; CHECK-NEXT: store <2 x float> [[TMP3]], ptr [[INCDEC_PTR1]], align 4
-; CHECK-NEXT: [[TMP5:%.*]] = load float, ptr [[INCDEC_PTR5]], align 4
-; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP5]], 3.000000e+00
-; CHECK-NEXT: store float [[ADD9]], ptr [[INCDEC_PTR7]], align 4
-; CHECK-NEXT: ret void
+; NON-POW2-LABEL: @add1fn(
+; NON-POW2-NEXT: entry:
+; NON-POW2-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds float, ptr [[SRC:%.*]], i64 1
+; NON-POW2-NEXT: [[TMP0:%.*]] = load float, ptr [[SRC]], align 4
+; NON-POW2-NEXT: [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, ptr [[DST:%.*]], i64 1
+; NON-POW2-NEXT: store float [[TMP0]], ptr [[DST]], align 4
+; NON-POW2-NEXT: [[TMP1:%.*]] = load <3 x float>, ptr [[INCDEC_PTR]], align 4
+; NON-POW2-NEXT: [[TMP2:%.*]] = fadd <3 x float> [[TMP1]], <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>
+; NON-POW2-NEXT: store <3 x float> [[TMP2]], ptr [[INCDEC_PTR1]], align 4
----------------
fhahn wrote:
(Moved comment here https://github.com/llvm/llvm-project/commit/0ddbdd3052e694eedef6bccce3b17f92dff7add7#r137302312)
> Is this supported on all platforms? Or you rely on the check that this will end up with the scalarized version in the end?
I checked on AArch64 and X86 and both support OK lowering of such operations, although in a number of cases improvements can be made, e.g. #78632 and #78637 for loads and stores of <3 x i8> on AArch64.
This effectively relies on the cost-models to return accurate costs for non-power-of-2 vector operations. If a vector op of a non-power-of-2 type will get scalarizied in the backend, the cost model should return a high cost.
https://github.com/llvm/llvm-project/pull/77790
More information about the llvm-commits
mailing list