[PATCH] Improve the cost evaluation of LSR

Fri May 1 00:11:01 PDT 2015

Hi atrick, sanjoy, hfinkel,

1. Motivation driven by testcases:

Existing LSR evaluates the cost of LSRUse Formula and LSR Solution by assigning weights to some indexes from high to low: NumRegs > AddRecCost > NumIVMuls > ... This doesn't work well for the testcase mentioned in https://llvm.org/bugs/show_bug.cgi?id=23384. For the testcase, on x86, AddRecCost should be valued more important than NumRegs because it will be translated to increased insn numbers directly, especially when the register pressure in the loop is low.

This problem may not be apparent on other architecture. X86 supports complex addressing mode so several load/store insns can share the same AddRec by using complex addressing mode. For those architectures not supporting complex addressing mode, they need extra insns for loads/stores to share the same AddRec, so saving AddRecCosts may not be translated to saving of insns.

However for architectures with pre/postinc addressing mode, we even want to increase AddRec. Here is another testcase:
int a[1000], b[1000], N, j;

void foo() {
  long i;
  for (i = 0; i < N; i+=1) {
    a[i+12] = 15;
    b[i-2] = 15;
  }
}

For AArch64, llvm trunk generates code:.LBB0_2:                                // %for.body
        lsl     x13, x9, #2
        add     x9, x9, #1              // =1
        add      x14, x10, x13
        add      x13, x12, x13
        str     w11, [x14, #48]
        stur    w11, [x13, #-8]
        cmp      x9, x8
        b.lt    .LBB0_2

With the patch attached, it generates more AddRec but utilizes post-increment addressing mode to reduce the insns needed.
.LBB0_2:                                // %for.body
        str     w12, [x10], #4
        str     w12, [x11], #4
        add     x9, x9, #1              // =1
        cmp      x9, x8
        b.lt    .LBB0_2

>From the two testcases above, I think the existing cost evaluation which uses NumRegs as the major index may not be optimal, because instruction number is the most important factor showing the cost. Understanding all the cases in which extra instruction will be introduced and including those cases in the LSR cost model is the key to get the best performance.

2. Major changes in the patch:
   1. The weights of new indexes from high to low:  InstNumCost + SpillRegsCost > InstCmplxCost > SetupCost. InstNumCost is the number of extra instructions needed. SpillRegsCost is computed from regs used by Cost::UpdateSpillRegsCost (A simple version is used and it will be improved in the future). InstCmplxCost shows the complexity of instructions in the loop, which affects the laency of instruction decoding and available insn execution ports. SetupCost is the cost of instructions inserted in loop preheader.   2. InstNumCost is updated for some cases according to benchmark analysis on X86 architecture. It is also updated to handle the second testcase above to utilize pre/post-increment addressing mode on AArch64 or ARM.
      * For AddRec, InstNumCost will be increased by 1 unless the stride can be folded into load/store insns by using pre/post-increment addressing mode.
      * For ICmpZero, the loop can be a countdown loop only when the formula of ICmpZero only contains one register -- in this case, the conditional compare or test instruction can be omitted. In other cases, InstNumCost will be increased by 1.
      * Original NumBaseAdds/NumIVMuls/ImmCost are included into InstNumCost.

3. Request for suggestions.
   I did performance testing and tuning for the patch using google benchmarks on x86, and saw some perf improvement (0.2% geomean on sandybridge and 0.4% on westmere). But I didn't do any testing and tuning for other architectures so it can be worse than the original LSR on other architectures. Any suggestions to improve it on other architectures if the brief idea here is ok. 

Thanks,
Wei.

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D9429

Files:
  include/llvm/Analysis/TargetTransformInfo.h
  include/llvm/Analysis/TargetTransformInfoImpl.h
  include/llvm/CodeGen/BasicTTIImpl.h
  include/llvm/Target/TargetLowering.h
  lib/Analysis/TargetTransformInfo.cpp
  lib/Target/AArch64/AArch64ISelLowering.cpp
  lib/Target/AArch64/AArch64ISelLowering.h
  lib/Target/ARM/ARMISelLowering.cpp
  lib/Target/ARM/ARMISelLowering.h
  lib/Transforms/Scalar/LoopStrengthReduce.cpp

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D9429.24794.patch
Type: text/x-patch
Size: 24453 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150501/9525ae89/attachment.bin>