[llvm-bugs] [Bug 28876] New: improve hot path code through inter procedural alias analysis in cold path

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Aug 5 15:38:37 PDT 2016


https://llvm.org/bugs/show_bug.cgi?id=28876

            Bug ID: 28876
           Summary: improve hot path code through inter procedural alias
                    analysis in cold path
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Interprocedural Analyses
          Assignee: unassignedbugs at nondot.org
          Reporter: carrot at google.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

The source code is

#include <vector>

void foo(int iters, int size) {
  std::vector<int> fv;
  for (int i = 0; i < iters; ++i) {
    fv.clear();
    for (int j = 0; j < size; ++j) {
      fv.push_back(j);
    }
  }
}

When compiled into powerpc instructions with options
  -m64 -O2 -mvsx -mcpu=power8 -fno-exceptions

LLVM generates following code for the loops:


.LBB0_2:                                # %for.cond.cleanup3.for.body_crit_edge
                                        #   in Loop: Header=BB0_3 Depth=1
        ld 4, 112(31)
.LBB0_3:                                # %for.body
                                        # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_6 Depth 2
        std 4, 120(31)
        stw 26, 108(31)
        bc 4, 9, .LBB0_10
# BB#4:                                 # %for.body4.preheader
                                        #   in Loop: Header=BB0_3 Depth=1
        li 3, 0
        b .LBB0_6
        .p2align        4
.LBB0_5:                                #
%_ZNSt6vectorIiSaIiEE9push_backERKi.exit.for.body4_crit_edge
                                        #   in Loop: Header=BB0_6 Depth=2
        ld 4, 120(31)
.LBB0_6:                                # %for.body4
                                        #   Parent Loop BB0_3 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        ld 5, 128(31)
        cmpld    4, 5
        beq      0, .LBB0_8
# BB#7:                                 # %if.then.i
                                        #   in Loop: Header=BB0_6 Depth=2
        addi 5, 4, 4                    // update pointer of available space
        stw 3, 0(4)                     // push_back j
        std 5, 120(31)
        b .LBB0_9
        .p2align        4
.LBB0_8:                                # %if.else.i
                                        #   in Loop: Header=BB0_6 Depth=2
        mr 3, 28
        mr 5, 27
        bl
_ZNSt6vectorIiSaIiEE13_M_insert_auxEN9__gnu_cxx17__normal_iteratorIPiS1_EERKi
        nop
.LBB0_9:                                #
%_ZNSt6vectorIiSaIiEE9push_backERKi.exit
                                        #   in Loop: Header=BB0_6 Depth=2
        lwz 3, 108(31)
        addi 3, 3, 1                    // j++
        stw 3, 108(31)
        cmpw     3, 30
        blt      0, .LBB0_5
.LBB0_10:                               # %for.cond.cleanup3
                                        #   in Loop: Header=BB0_3 Depth=1
        addi 25, 25, 1
        cmplw    25, 29
        bne      0, .LBB0_2


GCC generates following code for the hot path of the loops, there are more code
in the cold path not shown here:

.L2:
        ble 3,.L4
        mr 31,28
        li 23,0
        .p2align 4,,15
.L14:
        cmpld 7,31,10
        beq 7,.L5
        cmpdi 7,31,0
        beq 7,.L6
        stw 23,0(31)           // push_back j
.L6:
        addi 31,31,4           // update the pointer
.L7:
        addi 9,23,1            // j++
        cmpw 7,9,29
        extsw 23,9
        bne 7,.L14
.L4:
        addi 8,24,1
        cmpw 7,8,27
        extsw 24,8
        bne 7,.L2


Both compilers inlined function push_back, but gcc inlined all sub functions
called by push_back, so it can do more complete alias analysis, and figure out
variable j does not alias with fields of fv, so all variables are allocated to
registers.

On the other hand, llvm didn't inline the function _M_insert_aux, it is
reasonable since it is in cold path only. But it needs pointers of both j and
fv, so it may cause j and fv to be alias with each other. Then it impacts the
accesses of j and fv in the hot path, and generates many memory access
instructions in the hot path. It slows down this function by 5.9x on power8.

One possible improvement is through inter procedural alias analysis, we can
figure out that j is not aliased with fields of fv in function _M_insert_aux
even without inlining it, so we can allocate them in registers in the hot path,
only store and load them around the function call to satisfy the parameter
requirement of _M_insert_aux.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160805/2bb91319/attachment.html>


More information about the llvm-bugs mailing list