<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - AVX512 Regression: KNL backend does not use KNL processor features, SKX works"

   href="https://llvm.org/bugs/show_bug.cgi?id=31817">31817</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>AVX512 Regression: KNL backend does not use KNL processor features, SKX works

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>wenzel.jakob@epfl.ch

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>I've noticed a serious performance and code generation regression on Clang's

KNL (= Knight's Landing) target. 

With the latest trunk version of Clang & LLVM, the compiler no longer makes use

of any of the new AVX512 architectural features when working with non-512bit

registers (XMM & ZMM). This includes making full use of all 32 registers

(instead of 16) and using extensions such as builtin broadcasting of constants.

Losing half of the registers in particular is a serious performance regression

-- this wasn't the case with older versions of Clang.

Interestingly, the SKX == skylake-avx512 target remains unaffected.

Here is an example for the broadcasting of constants. Consider the following

snippet:

    #include <immintrin.h>

    __m256 mul_constant(__m256 x) {

        return _mm256_mul_ps(x, _mm256_set1_ps(1234.f));

    }

Compiling this for the "SKX" target produces the following code:

    $ clang-5.0 -march=knl -O3 -S -fomit-frame-pointer test.c -o test.o

Assembly (cleaned up):

    _mul_constant: 

            vmulps  LCPI0_0(%rip){1to8}, %ymm0, %ymm0    <----- good!

            retq

On the other hand, compiling for "KNL" yields

    $ clang-5.0 -march=knl -O3 -S -fomit-frame-pointer test.c -o test.o

Assembly (cleaned up):

    _mul_constant:

            vbroadcastss    LCPI0_0(%rip), %ymm1        <----- ?!?

            vmulps  %ymm1, %ymm0, %ymm0                 <----- code should be

identical

            retq</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>