[PATCH] D28307: Add Instruction number to LSR cost model (PR23384) part 1 of 3

Thu Jan 26 19:14:17 PST 2017

evstupac added a comment.

Hi,

Thanks for taking a look.

> Also a related question, what is your plan to move with that heuristic?

As we deal in https://reviews.llvm.org/D27695, there will be next part moving Solution cost comparison to a target part.
For x86 I want to move "Insns" to number 1 priority. I've got good performance and code size results on my benchmarks (only linpack got 3% regression, which transforms into gain for some CPUs).
Anyway the results differs for x86 CPUs. So maybe we'll end up with different cost models.

I understand that the change is very sensitive - so an option gives testing opportunity.

> Generally speaking, I think the number of instructions is a good heuristic only for code size and it is not even a given. Indeed, for performance we care about critical path, IPL, this kind of thing. What am I saying is having more instructions is not necessarily a bad thing. The other thing is I believe we can end up with cases with less instructions, but more registers and potentially more spill. This is possible to happen because the heuristic that account for "spill" only trigger when the NumRegs gets high enough. However, NumRegs may be low enough and spill may already happen.

Currently LSR take in account only Number of Registers. It is very important for 32 bit mode, but less important for 64 bit mode and float/vector loops.
While testing I have not seen cases where Insns count is less, but NumRegs are much bigger. Insns correlate with NumRegs (because Insns depend AddRec and NumBaseAdds).  
Usually the case is that we have 1 more register (or same number) but less instructions. Putting Insns as first priority does not add much spill/fills. However it would be good to get such statistics. I'll gather this for x86.

Thanks,
Evgeny

================
Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:1195
+  unsigned TTIRegNum = TTI.getNumberOfRegisters(false) - 1;
+  if (NumRegs > TTIRegNum) {
+    // Cost already exceeded TTIRegNum, then only newly added register can add
----------------
qcolombet wrote:
> With this kind of check we are (sort of) sure we are going to spill to materialize this formulae, but that does not mean the other don't.
> Bare in mind that NumRegs is only what's use in the formulae, not what is live.
> 
> Your comment is right, just wanted to make sure we are on the same page. 
Yes. I understand this.
Instruction number and Register number have correlations.
When we move cost calculation to target part we can give Insns and RegNum some weights instead of priority.

================
Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:1198
+    // new instructions.
+    if (NumRegs > TTIRegNum)
+      Insns += (NumRegs - PrevNumRegs);
----------------
qcolombet wrote:
> I'm confused, this was tested in the condition just before, i.e., it's always going to be true and the else statement is always going to be false.
> 
> Am I missing something?
This is a misprint.
It should be PrevNumRegs here like it was in inital patch D27695.
Thanks for catching this.

================
Comment at: test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll:1
+; RUN: llc < %s -O2 -march=x86-64 -lsr-insns-cost -asm-verbose=0 | FileCheck %s
+
----------------
qcolombet wrote:
> I would like to see opt -loop-reduce checks as well.
It is harder to see profit in IR as we are getting even more instructions (assuming target will combine them to complicated address or remove test on 0).
The will be harder to understand the test goal, but it is not a big deal to add such.

Repository:
  rL LLVM

https://reviews.llvm.org/D28307