[PATCH] D105432: [Analysis] Add simple cost model for strict (in-order) reductions

Tue Jul 20 08:04:22 PDT 2021

RKSimon added inline comments.

================
Comment at: llvm/test/Analysis/CostModel/X86/reduce-fadd.ll:14
 ; SSE2-LABEL: 'reduce_f64'
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call double @llvm.vector.reduce.fadd.v1f64(double %arg, <1 x double> undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call double @llvm.vector.reduce.fadd.v2f64(double %arg, <2 x double> undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call double @llvm.vector.reduce.fadd.v4f64(double %arg, <4 x double> undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8 = call double @llvm.vector.reduce.fadd.v8f64(double %arg, <8 x double> undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16 = call double @llvm.vector.reduce.fadd.v16f64(double %arg, <16 x double> undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V1 = call double @llvm.vector.reduce.fadd.v1f64(double %arg, <1 x double> undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %V2 = call double @llvm.vector.reduce.fadd.v2f64(double %arg, <2 x double> undef)
----------------
This looks too high for what is just a single f64 fadd (SSE floating point extract from 0 is free) - it might be a problem in x86 scalarization overhead ?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105432/new/

https://reviews.llvm.org/D105432