<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - AVX512: Update __vectorcall calling conventions"
   href="https://llvm.org/bugs/show_bug.cgi?id=28963">28963</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>AVX512: Update __vectorcall calling conventions
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>wenzel.jakob@epfl.ch
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Clang supports the __vectorcall calling convention which can be used to
directly invoke functions which accept SIMD vectors in registers. This works
great for SSE and AVX vectors, but it looks like the calling convention wasn't
yet updated to the new AVX512 512 bit vectors:

Consider the following simple passthrough functions:

struct Wrapper256 { __m256 x; };
struct Wrapper512 { __m512 x; };

Wrapper256 __vectorcall test1(Wrapper256 x) { return x; }
Wrapper512 __vectorcall test2(Wrapper512 x) { return x; }

Using Clang trunk (clang -O3 -mavx512f -fomit-frame-pointer test.cpp -S), this
assembles to:

__Z4add110Wrapper256:                   ## @_Z4add110Wrapper256
    retq

__Z4add210Wrapper512:                   ## @_Z4add210Wrapper512
    pushq    %rbp
    movq    %rsp, %rbp
    andq    $-64, %rsp
    subq    $64, %rsp
    vmovaps    16(%rbp), %ymm0
    vmovaps    48(%rbp), %ymm1
    vmovaps    %ymm1, 32(%rdi)
    vmovaps    %ymm0, (%rdi)
    movq    %rdi, %rax
    movq    %rbp, %rsp
    popq    %rbp
    retq</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>