<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - AVX2 - aggressive broadcast generation instead of memory operands"

   href="https://bugs.llvm.org/show_bug.cgi?id=32564">32564</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>AVX2 - aggressive broadcast generation instead of memory operands

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>LLVM Codegen

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>regis.portalez@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=18247" name="attach_18247" title="open zip to see code reproducer (clang vs gcc and icc)">attachment 18247</a> <a href="attachment.cgi?id=18247&action=edit" title="open zip to see code reproducer (clang vs gcc and icc)">[details]</a></span>

open zip to see code reproducer (clang vs gcc and icc)

Following resolution of <a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - constant vector value splatted at runtime with broadcast instruction instead of loaded (only core-avx2 target?)"

   href="show_bug.cgi?id=20054">bug #20054</a>

(<a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - constant vector value splatted at runtime with broadcast instruction instead of loaded (only core-avx2 target?)"

   href="show_bug.cgi?id=20054">https://bugs.llvm.org/show_bug.cgi?id=20054</a>).

llvm codegen (x86 - avx2) now always generates broadcast instructions for splat

values, instead of using memory operands. 

See this reproducer : 

#include <immintrin.h>

__m256d mulconst(__m256d x) {

        const __m256d a = { 15.0, 15.0, 15.0, 15.0 };

        return _mm256_mul_pd(x, a);

}

generates [ -O3 -g  -S -mavx2 -mavx -mfma  ]

.LCPI0_0:

        .quad   4624633867356078080     # double 15

mulconst(double __vector(4)):                       # @mulconst(double

__vector(4))

        vbroadcastsd    ymm1, qword ptr [rip + .LCPI0_0]

        vmulpd  ymm0, ymm0, ymm1

        ret

This is legitimate when optimizing for code size, but not for speed.

Indeed: 

   vbroadcastsd is a supplemental instruction,

   the result consumes an extra register (which can further generate spilling)

   this prevents any use of memory operands, even with inline assembly. 

See attached larger reproducer to spot unnecessary spills (and compared

assemble between gcc 6.2 and clang 4.0.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>