<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - AVX512 Regression: KNL backend does not use KNL processor features, SKX works"
   href="https://llvm.org/bugs/show_bug.cgi?id=31817">31817</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>AVX512 Regression: KNL backend does not use KNL processor features, SKX works
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>wenzel.jakob@epfl.ch
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I've noticed a serious performance and code generation regression on Clang's
KNL (= Knight's Landing) target. 

With the latest trunk version of Clang & LLVM, the compiler no longer makes use
of any of the new AVX512 architectural features when working with non-512bit
registers (XMM & ZMM). This includes making full use of all 32 registers
(instead of 16) and using extensions such as builtin broadcasting of constants.
Losing half of the registers in particular is a serious performance regression
-- this wasn't the case with older versions of Clang.

Interestingly, the SKX == skylake-avx512 target remains unaffected.

Here is an example for the broadcasting of constants. Consider the following
snippet:

    #include <immintrin.h>

    __m256 mul_constant(__m256 x) {
        return _mm256_mul_ps(x, _mm256_set1_ps(1234.f));
    }

Compiling this for the "SKX" target produces the following code:

    $ clang-5.0 -march=knl -O3 -S -fomit-frame-pointer test.c -o test.o

Assembly (cleaned up):

    _mul_constant: 
            vmulps  LCPI0_0(%rip){1to8}, %ymm0, %ymm0    <----- good!
            retq

On the other hand, compiling for "KNL" yields

    $ clang-5.0 -march=knl -O3 -S -fomit-frame-pointer test.c -o test.o

Assembly (cleaned up):

    _mul_constant:
            vbroadcastss    LCPI0_0(%rip), %ymm1        <----- ?!?
            vmulps  %ymm1, %ymm0, %ymm0                 <----- code should be
identical
            retq</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>