<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - improve hot path code through inter procedural alias analysis in cold path"
   href="https://llvm.org/bugs/show_bug.cgi?id=28876">28876</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>improve hot path code through inter procedural alias analysis in cold path
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Interprocedural Analyses
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>carrot@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The source code is

#include <vector>

void foo(int iters, int size) {
  std::vector<int> fv;
  for (int i = 0; i < iters; ++i) {
    fv.clear();
    for (int j = 0; j < size; ++j) {
      fv.push_back(j);
    }
  }
}

When compiled into powerpc instructions with options
  -m64 -O2 -mvsx -mcpu=power8 -fno-exceptions

LLVM generates following code for the loops:


.LBB0_2:                                # %for.cond.cleanup3.for.body_crit_edge
                                        #   in Loop: Header=BB0_3 Depth=1
        ld 4, 112(31)
.LBB0_3:                                # %for.body
                                        # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_6 Depth 2
        std 4, 120(31)
        stw 26, 108(31)
        bc 4, 9, .LBB0_10
# BB#4:                                 # %for.body4.preheader
                                        #   in Loop: Header=BB0_3 Depth=1
        li 3, 0
        b .LBB0_6
        .p2align        4
.LBB0_5:                                #
%_ZNSt6vectorIiSaIiEE9push_backERKi.exit.for.body4_crit_edge
                                        #   in Loop: Header=BB0_6 Depth=2
        ld 4, 120(31)
.LBB0_6:                                # %for.body4
                                        #   Parent Loop BB0_3 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        ld 5, 128(31)
        cmpld    4, 5
        beq      0, .LBB0_8
# BB#7:                                 # %if.then.i
                                        #   in Loop: Header=BB0_6 Depth=2
        addi 5, 4, 4                    // update pointer of available space
        stw 3, 0(4)                     // push_back j
        std 5, 120(31)
        b .LBB0_9
        .p2align        4
.LBB0_8:                                # %if.else.i
                                        #   in Loop: Header=BB0_6 Depth=2
        mr 3, 28
        mr 5, 27
        bl
_ZNSt6vectorIiSaIiEE13_M_insert_auxEN9__gnu_cxx17__normal_iteratorIPiS1_EERKi
        nop
.LBB0_9:                                #
%_ZNSt6vectorIiSaIiEE9push_backERKi.exit
                                        #   in Loop: Header=BB0_6 Depth=2
        lwz 3, 108(31)
        addi 3, 3, 1                    // j++
        stw 3, 108(31)
        cmpw     3, 30
        blt      0, .LBB0_5
.LBB0_10:                               # %for.cond.cleanup3
                                        #   in Loop: Header=BB0_3 Depth=1
        addi 25, 25, 1
        cmplw    25, 29
        bne      0, .LBB0_2


GCC generates following code for the hot path of the loops, there are more code
in the cold path not shown here:

.L2:
        ble 3,.L4
        mr 31,28
        li 23,0
        .p2align 4,,15
.L14:
        cmpld 7,31,10
        beq 7,.L5
        cmpdi 7,31,0
        beq 7,.L6
        stw 23,0(31)           // push_back j
.L6:
        addi 31,31,4           // update the pointer
.L7:
        addi 9,23,1            // j++
        cmpw 7,9,29
        extsw 23,9
        bne 7,.L14
.L4:
        addi 8,24,1
        cmpw 7,8,27
        extsw 24,8
        bne 7,.L2


Both compilers inlined function push_back, but gcc inlined all sub functions
called by push_back, so it can do more complete alias analysis, and figure out
variable j does not alias with fields of fv, so all variables are allocated to
registers.

On the other hand, llvm didn't inline the function _M_insert_aux, it is
reasonable since it is in cold path only. But it needs pointers of both j and
fv, so it may cause j and fv to be alias with each other. Then it impacts the
accesses of j and fv in the hot path, and generates many memory access
instructions in the hot path. It slows down this function by 5.9x on power8.

One possible improvement is through inter procedural alias analysis, we can
figure out that j is not aliased with fields of fv in function _M_insert_aux
even without inlining it, so we can allocate them in registers in the hot path,
only store and load them around the function call to satisfy the parameter
requirement of _M_insert_aux.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>