[llvm-bugs] [Bug 37760] New: regcall calling convention mismatch between Clang and Intel C++ compiler (AVX512)
via llvm-bugs
llvm-bugs at lists.llvm.org
Sat Jun 9 19:04:07 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=37760
Bug ID: 37760
Summary: regcall calling convention mismatch between Clang and
Intel C++ compiler (AVX512)
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: wenzel.jakob at epfl.ch
CC: llvm-bugs at lists.llvm.org
Dear LLVM/Clang developers,
the 'regcall' calling convention currently does not match the behavior of the
Intel compiler when working with values that use AVX512-style ZMM registers.
This is problematic for two reasons:
1. 'regcall' functions compiled with Clang and ICPC are not ABI-compatible.
2. The 'regcall' calling convention is used to improve performance in
vectorized code involving function calls. It currently seems impossible to
obtain this benefit when using Clang & AVX512.
How to reproduce:
-----------------
Consider the following simple snippet which has 256 and 512 bit versions of a
function that just forwards its argumetns.
/// ---------------------
#include <immintrin.h>
struct Vector2_256 { __m256 x[2]; };
struct Vector2_512 { __m512 x[2]; };
__attribute__((regcall)) void f_256(Vector2_256 x);
__attribute__((regcall)) void f_512(Vector2_512 x);
__attribute__((regcall)) void call_f_256(Vector2_256 x) { return f_256(x); }
__attribute__((regcall)) void call_f_512(Vector2_512 x) { return f_512(x); }
/// ---------------------
With Clang trunk, this compiles to
$ clang++ test.cpp -march=skx -S -o - -O3 -fomit-frame-pointer
# (with minor cleanups)
__Z22__regcall3__call_f_25611Vector2_256:
jmp __Z17__regcall3__f_25611Vector2_256
__Z22__regcall3__call_f_51211Vector2_512:
pushq %rbp
movq %rsp, %rbp
pushq %rsp
andq $-64, %rsp
subq $192, %rsp
vmovaps 16(%rbp), %zmm0
vmovaps 80(%rbp), %zmm1
vmovaps %zmm1, 64(%rsp)
vmovaps %zmm0, (%rsp)
vzeroupper
callq __Z17__regcall3__f_51211Vector2_512
leaq -8(%rbp), %rsp
popq %rsp
popq %rbp
retq
In other words, the 256 bit version is correct, while the 512-bit version
fetches the arguments and passes them on the stack (which it shouldn't do.)
On ICPC, I get
_Z22__regcall2__call_f_25611Vector2_256:
jmp _Z17__regcall2__f_25611Vector2_256
_Z22__regcall2__call_f_51211Vector2_512:
jmp _Z17__regcall2__f_51211Vector2_512
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180610/c545b843/attachment-0001.html>
More information about the llvm-bugs
mailing list