[PATCH] D105432: [Analysis] Add simple cost model for strict (in-order) reductions
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 20 08:04:22 PDT 2021
RKSimon added inline comments.
================
Comment at: llvm/test/Analysis/CostModel/X86/reduce-fadd.ll:14
; SSE2-LABEL: 'reduce_f64'
-; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call double @llvm.vector.reduce.fadd.v1f64(double %arg, <1 x double> undef)
-; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call double @llvm.vector.reduce.fadd.v2f64(double %arg, <2 x double> undef)
-; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4 = call double @llvm.vector.reduce.fadd.v4f64(double %arg, <4 x double> undef)
-; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8 = call double @llvm.vector.reduce.fadd.v8f64(double %arg, <8 x double> undef)
-; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V16 = call double @llvm.vector.reduce.fadd.v16f64(double %arg, <16 x double> undef)
+; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V1 = call double @llvm.vector.reduce.fadd.v1f64(double %arg, <1 x double> undef)
+; SSE2-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V2 = call double @llvm.vector.reduce.fadd.v2f64(double %arg, <2 x double> undef)
----------------
This looks too high for what is just a single f64 fadd (SSE floating point extract from 0 is free) - it might be a problem in x86 scalarization overhead ?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105432/new/
https://reviews.llvm.org/D105432
More information about the llvm-commits
mailing list