[PATCH] D59710: [SLP] remove lower limit for forming reduction patterns
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 1 09:34:08 PDT 2019
ABataev added inline comments.
================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll:43
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
-; THRESHOLD-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
-; THRESHOLD-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
-; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
-; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
-; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
-; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
-; THRESHOLD-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
-; THRESHOLD-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
-; THRESHOLD-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
-; THRESHOLD-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
-; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
-; THRESHOLD-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
-; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
-; THRESHOLD-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
+; THRESHOLD-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
+; THRESHOLD-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
----------------
RKSimon wrote:
> ABataev wrote:
> > What about this one? This also looks like a regression
> Sanjay and I hve checked with godbolt/llvm-mca and this looks like a definite win (checked on bdver2, haswell and btver2). Top is scalar, middle is trunk and bottom is patched IR:
>
> bdver2: https://godbolt.org/z/jwCPgI
> haswell: https://godbolt.org/z/R-h8o_
>
>
But it does not mean the patch is correct, it means that we again not quite good with the cost calculation + previous implementation is not quite optimal. But the number of vectorised operations is reduced. It means, that patch introduces some regressions in the vectorization result. And in some cases, it will result in significantly worse code.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59710/new/
https://reviews.llvm.org/D59710
More information about the llvm-commits
mailing list