<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Suboptimal optmisation of inlined function loop for AArch64 O3 with LTO"
   href="https://bugs.llvm.org/show_bug.cgi?id=45554">45554</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Suboptimal optmisation of inlined function loop for AArch64 O3 with LTO
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>david.spickett@linaro.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=23363" name="attach_23363" title="Preprocessed source">attachment 23363</a> <a href="attachment.cgi?id=23363&action=edit" title="Preprocessed source">[details]</a></span>
Preprocessed source

After <a href="https://reviews.llvm.org/D76792">https://reviews.llvm.org/D76792</a> Spec benchmark "xalancbmk" showed
regressions at -O2/-O3 with LTO enabled.

This has been narrowed down to a loop in XalanDOMStringCache::release. Where
extra instructions are inserted in the loop body, that would normally be placed
at the exit points of the function.

For example, before we had:
244         ldur   x10, [x20, #-8]
               cmp    x10, x1 
             ↓ b.eq   e8
<...>
         e8:   sub    x20, x20, #0x8
         ec:   cmp    x20, x8 

After:
   247         mov    x10, x20
    46         ldr    x11, [x10, #8]!
               cmp    x11, x1 
             ↓ b.eq   dc
<...>
         dc:   mov    x20, x10
               cmp    x20, x8 
             ↓ b.ne   f8      

Note that after is using writeback to update x10, and resets it if branch not
taken. This is adding instructions to the loop body, where before we would only
write to x20 if the branch was taken.

I will attach the perf output for before and after, along with the preprocessed
source file. Compile with:
./clang++ -O3 -flto --target=aarch64-linux-gnu
/tmp/perfstuff/XalanDOMStringCache.ii

You'll need a sysroot to do so, which is why I'm trying to make a reduced
example. Having trouble getting that setup though. I think forcing std::find to
be inlined might be enough.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>