<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Suboptimal optmisation of inlined function loop for AArch64 O3 with LTO"
href="https://bugs.llvm.org/show_bug.cgi?id=45554">45554</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Suboptimal optmisation of inlined function loop for AArch64 O3 with LTO
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>david.spickett@linaro.org
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=23363" name="attach_23363" title="Preprocessed source">attachment 23363</a> <a href="attachment.cgi?id=23363&action=edit" title="Preprocessed source">[details]</a></span>
Preprocessed source
After <a href="https://reviews.llvm.org/D76792">https://reviews.llvm.org/D76792</a> Spec benchmark "xalancbmk" showed
regressions at -O2/-O3 with LTO enabled.
This has been narrowed down to a loop in XalanDOMStringCache::release. Where
extra instructions are inserted in the loop body, that would normally be placed
at the exit points of the function.
For example, before we had:
244 ldur x10, [x20, #-8]
cmp x10, x1
↓ b.eq e8
<...>
e8: sub x20, x20, #0x8
ec: cmp x20, x8
After:
247 mov x10, x20
46 ldr x11, [x10, #8]!
cmp x11, x1
↓ b.eq dc
<...>
dc: mov x20, x10
cmp x20, x8
↓ b.ne f8
Note that after is using writeback to update x10, and resets it if branch not
taken. This is adding instructions to the loop body, where before we would only
write to x20 if the branch was taken.
I will attach the perf output for before and after, along with the preprocessed
source file. Compile with:
./clang++ -O3 -flto --target=aarch64-linux-gnu
/tmp/perfstuff/XalanDOMStringCache.ii
You'll need a sysroot to do so, which is why I'm trying to make a reduced
example. Having trouble getting that setup though. I think forcing std::find to
be inlined might be enough.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>