<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - LIBGAV1 perf regression trunk vs. Clang 9"
href="https://bugs.llvm.org/show_bug.cgi?id=44539">44539</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>LIBGAV1 perf regression trunk vs. Clang 9
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>andrea.dibiagio@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre>This is related to <a class="bz_bug_link
bz_status_NEW "
title="NEW - Perf regressions on many benchmarks with AMD Threadripper"
href="show_bug.cgi?id=44411">bug 44411</a>.
There is a significant perf regression in benchmark LIBGAV1.
---
-O3 -march=znver1
Numbers are FPS (frames per second); more is better.
```
single thread -- 2000 frames
========
| GCC 7.4 | CLANG 9.x | CLANG Master
chimera_8b_1080p.ivf | 22.77 | 21.86 | 18.71
chimera_10b_1080p.ivf | 11.31 | 12.68 | 11.67
summer_nature_1080p.ivf | 21.10 | 21.02 | 18.29
summer_nature_4K.ivf | 4.74 | 4.57 | 3.94
multi threaded (8) -- no frame limit
========
| GCC 7.4 | CLANG 9.x | CLANG master
chimera_8b_1080p.ivf | 43.51 | 42.76 | 34.80
chimera_10b_1080p.ivf | 16.18 | 18.98 | 17.12
summer_nature_1080p.ivf | 64.22 | 63.66 | 53.89
summer_nature_4K.ivf | 17.57 | 17.17 | 14.70
multi threaded (16) -- no frame limit
========
| GCC 7.4 | CLANG 9.x | CLANG master
chimera_8b_1080p.ivf | 43.40 | 43.05 | 38.73
chimera_10b_1080p.ivf | 16.54 | 19.68 | 18.67
summer_nature_1080p.ivf | 62.72 | 62.20 | 54.96
summer_nature_4K.ivf | 19.31 | 19.11 | 17.13
```
The single threaded execution is ~14% slower on master vs clang 9.x.
Later I will post a full description of the underlying issue that caused this
perf regression.
tl;dr: performance degradation in libgav is caused by poor decisions made by
pass "x86 cmov converter". In particular, a bunch of CMOVs from a hot loop are
now sub-optimally expanded into if-then blocks. Those CMOVs weren't expanded by
the Clang 9 compiler (that was the correct decision).
If we disable that pass then we fully get back the performance loss. For
example, decoding "chimera_8b_1080p.ivf" with a single thread gives us an
average of 22.14 fps.
As I wrote, I plan to post all my findings in a follow-up comment.
NOTE: this is unlikely to be AMD specific. For example, I can reproduce the
poor CMOV expansions if I generate code for Skylake.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>