<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [x86] scalar FP code runs ~15% slower on Haswell when compiled with -mavx"

   href="https://bugs.llvm.org/show_bug.cgi?id=36180">36180</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[x86] scalar FP code runs ~15% slower on Haswell when compiled with -mavx

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>spatel+llvm@rotateright.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=19783" name="attach_19783" title="himeno.c source file">attachment 19783</a> <a href="attachment.cgi?id=19783&action=edit" title="himeno.c source file">[details]</a></span>

himeno.c source file

I have a Haswell perf mystery that I can't explain. The himeno program (see

attachment) is an FP and memory benchmark that plows through large

multi-dimensional arrays doing 32-bit fadd/fsub/fmul.

To eliminate potentially questionable transforms and variation from the

vectorizers, build it as scalar-ops only like this:

$ ./clang -O2 himeno.c -fno-vectorize -fno-slp-vectorize  -o himeno_novec_sse

$ ./clang -O2 himeno.c -fno-vectorize -fno-slp-vectorize -mavx -o

himeno_novec_avx

And I'm testing on a 4.0GHz Haswell iMac running macOS 10.13.3:

$ ./himeno_novec_sse

mimax = 257 mjmax = 129 mkmax = 129

imax = 256 jmax = 128 kmax =128

cpu : 13.244777 sec.

Loop executed for 500 times

Gosa : 9.897132e-04 

MFLOPS measured : 5175.818966

Score based on MMX Pentium 200MHz : 160.391043

$ ./himeno_novec_avx

mimax = 257 mjmax = 129 mkmax = 129

imax = 256 jmax = 128 kmax =128

cpu : 15.533612 sec.

Loop executed for 500 times

Gosa : 9.897132e-04 

MFLOPS measured : 4413.176279

Score based on MMX Pentium 200MHz : 136.757864

There's an unfortunate amount of noise (~5%) in the perf on this system with

this benchmark, but these results are reproducible. I'm consistently seeing

~15% better perf with the non-AVX build.

If we look at the inner loop asm, they are virtually identical in terms of

operations. The SSE code just has a few extra instructions needed to copy

values because of the destructive ops, but the loads, stores, and math are the

same.

A IACA analysis of these loops says they should have virtually the same

throughput on HSW:

Block Throughput: 20.89 Cycles       Throughput Bottleneck: Backend

Loop Count:  22

Port Binding In Cycles Per Iteration:

--------------------------------------------------------------------------------------------------

|  Port  |   0   -  DV   |   1   |   2   -  D    |   3   -  D    |   4   |   5 

 |   6   |   7   |

--------------------------------------------------------------------------------------------------

| Cycles | 13.0     0.0  | 21.0  | 12.0    12.0  | 12.0    11.0  |  1.0  |  2.0

 |  2.0  |  0.0  |</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>