<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - h264 (SPEC) : 8% percent perf regression with -march=haswell compared to plain -O3"
   href="https://bugs.llvm.org/show_bug.cgi?id=43578">43578</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>h264 (SPEC) : 8% percent perf regression with -march=haswell compared to plain -O3
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>david.bolvansky@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I measured a significant performance drop for SPEC benchmark h264 with
-march=haswell:

417 seconds -O3 -march=haswell
418 seconds -O3 -mprefer-vector-width=128 -march=haswell
385 seconds -O3

I didnt go very deep, I went to check known hotspot function/loop of this
benchmark. I reduced benchmark so anybody can look at this hotspot on godbolt:
<a href="https://godbolt.org/z/i8vvYN">https://godbolt.org/z/i8vvYN</a>

(if we improve this reduced test case; this benchmark will improve too)

As I can see, the reason of this slowdown is aggresive vectorization; with
-march=haswell we probably destroy loop perf due to use of
vpextrd, vextracti128, vpmovzxwd (SLP vectorizer somehow decided that
vectorization is profitable; clearly not; cost model issue?). 

GCC/ICC performs no vectorization of this loop with -march=haswell.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>