[PATCH] D38611: Fix X86 regression on linpack

Thu Nov 2 14:42:09 PDT 2017

evstupac added a comment.

Hi Quentin,

Thanks for taking a look.
I'll address your comments.

Evgeny

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:2473
+    unsigned C1Insns = C1.Insns + (C1.FoldedAddress >> 3);
+    unsigned C2Insns = C2.Insns + (C2.FoldedAddress >> 3);
+    return std::tie(C1Insns, C1.NumRegs, C1.AddRecCost,
----------------
qcolombet wrote:
> Could you add a comment explaining the `>> 3`?
> 
> It does not make sense to count folded addresses as additional instructions on its own and the `>> 3` makes it even more cryptic why we do that :).
Sure.
All folded addresses are somehow executed in CPU as micro-ops. Anyway this is a resource. The question is how costly they are. X86 instructions with folded address access can go to less ports than without. This limitation leads to regressions in some cases.
The heuristic here tries to address this. When we have too many folded addresses in solution we count them as an additional instruction(s). Ideally it should be more complicated analysis, taking in account other loop instructions - but it is harder to implement here.

">> 3" is a kind of average bound for all x86 CPUs. It could differ for -march=slm and -march=core-avx2 for example, however testings showed that ">> 3" is the best average.

Repository:
  rL LLVM

https://reviews.llvm.org/D38611