<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - regcall calling convention mismatch between Clang and Intel C++ compiler (AVX512)"
href="https://bugs.llvm.org/show_bug.cgi?id=37760">37760</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>regcall calling convention mismatch between Clang and Intel C++ compiler (AVX512)
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>wenzel.jakob@epfl.ch
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Dear LLVM/Clang developers,
the 'regcall' calling convention currently does not match the behavior of the
Intel compiler when working with values that use AVX512-style ZMM registers.
This is problematic for two reasons:
1. 'regcall' functions compiled with Clang and ICPC are not ABI-compatible.
2. The 'regcall' calling convention is used to improve performance in
vectorized code involving function calls. It currently seems impossible to
obtain this benefit when using Clang & AVX512.
How to reproduce:
-----------------
Consider the following simple snippet which has 256 and 512 bit versions of a
function that just forwards its argumetns.
/// ---------------------
#include <immintrin.h>
struct Vector2_256 { __m256 x[2]; };
struct Vector2_512 { __m512 x[2]; };
__attribute__((regcall)) void f_256(Vector2_256 x);
__attribute__((regcall)) void f_512(Vector2_512 x);
__attribute__((regcall)) void call_f_256(Vector2_256 x) { return f_256(x); }
__attribute__((regcall)) void call_f_512(Vector2_512 x) { return f_512(x); }
/// ---------------------
With Clang trunk, this compiles to
$ clang++ test.cpp -march=skx -S -o - -O3 -fomit-frame-pointer
# (with minor cleanups)
__Z22__regcall3__call_f_25611Vector2_256:
jmp __Z17__regcall3__f_25611Vector2_256
__Z22__regcall3__call_f_51211Vector2_512:
pushq %rbp
movq %rsp, %rbp
pushq %rsp
andq $-64, %rsp
subq $192, %rsp
vmovaps 16(%rbp), %zmm0
vmovaps 80(%rbp), %zmm1
vmovaps %zmm1, 64(%rsp)
vmovaps %zmm0, (%rsp)
vzeroupper
callq __Z17__regcall3__f_51211Vector2_512
leaq -8(%rbp), %rsp
popq %rsp
popq %rbp
retq
In other words, the 256 bit version is correct, while the 512-bit version
fetches the arguments and passes them on the stack (which it shouldn't do.)
On ICPC, I get
_Z22__regcall2__call_f_25611Vector2_256:
jmp _Z17__regcall2__f_25611Vector2_256
_Z22__regcall2__call_f_51211Vector2_512:
jmp _Z17__regcall2__f_51211Vector2_512</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>