<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [ppc] bad code layout causes slower than gcc in 403.gcc"

   href="https://llvm.org/bugs/show_bug.cgi?id=25782">25782</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[ppc] bad code layout causes slower than gcc in 403.gcc

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: PowerPC

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>carrot@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>LLVM generated 403.gcc is 1.7% slower than gcc generated code on power8, for

input data g23.i, llvm is 8.68% slower.

In the perf result, in gcc generated code 10.03% of time is consumed by

htab_traverse, but in llvm generated code 16.48% of time is consumed by the

same function.

GCC generated following code for the loop body of function htab_traverse:

   99.87 :        1032c050:   addi    r31,r31,8                         //  HOT

    0.00 :        1032c054:   cmpld   cr7,r30,r31                       //  HOT

    0.00 :        1032c058:   ble     cr7,1032c08c <htab_traverse+0x7c> //  HOT

    0.01 :        1032c05c:   ld      r9,0(r31)                         //  HOT

    0.03 :        1032c060:   cmpldi  cr7,r9,1                          //  HOT

    0.00 :        1032c064:   ble     cr7,1032c050 <htab_traverse+0x40> //  HOT

    0.06 :        1032c068:   mtctr   r29

    0.00 :        1032c06c:   mr      r3,r31

    0.00 :        1032c070:   std     r2,24(r1)

    0.00 :        1032c074:   mr      r4,r28

    0.00 :        1032c078:   mr      r12,r29

    0.00 :        1032c07c:   bctrl

    0.01 :        1032c080:   ld      r2,24(r1)

    0.00 :        1032c084:   cmpdi   cr7,r3,0

    0.00 :        1032c088:   bne     cr7,1032c050 <htab_traverse+0x40>

LLVM generated following corresponding code:

   66.56 :        10306b20:   ldu     r3,8(r26)                        //  HOT

    0.00 :        10306b24:   cmpldi  r3,2                             //  HOT

    0.00 :        10306b28:   blt     10306b50 <htab_traverse+0x80>    //  HOT

    0.03 :        10306b2c:   mtctr   r28

    0.00 :        10306b30:   mr      r3,r30

    0.00 :        10306b34:   mr      r4,r29

    0.00 :        10306b38:   mr      r12,r28

    0.00 :        10306b3c:   std     r2,24(r1)

    0.01 :        10306b40:   bctrl

    0.01 :        10306b44:   ld      r2,24(r1)

    0.00 :        10306b48:   cmplwi  r3,0

    0.00 :        10306b4c:   beq     10306b5c <htab_traverse+0x8c>

   33.38 :        10306b50:   addi    r30,r30,8                       //  HOT

    0.00 :        10306b54:   cmpld   r30,r27                         //  HOT

    0.00 :        10306b58:   blt     10306b20 <htab_traverse+0x50>   //  HOT

So we can see that both compiler generate similar instructions, but with

different code layout. In gcc's code, all hot BBs are put together, but in

llvm's code hot BBs are separated, the taken branch causes slower performance.

So this is a code layout problem.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>