<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Clobbered XMM registers are not preserved around Intel-style inline assembly blocks in MS-ABI functions"
href="https://bugs.llvm.org/show_bug.cgi?id=51222">51222</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Clobbered XMM registers are not preserved around Intel-style inline assembly blocks in MS-ABI functions
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>12.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>skoulik@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>The issue is first observed with clang 10.0 bundled with MS Visual Studio 2019
on windows, but later confirmed with clang 7.0.1 on Linux (CentOS 7.7) and with
clang 12.0 bundled with Xcode 12.2 on Mac OS.
Here is the minimal reproducible example:
void test(void)
{
__asm
{
VPXOR YMM6, YMM6, YMM6
}
}
When compiled on windows with
clang-cl /O2 /FA -c test.cpp
it produces the following assembly (meta-information skipped for clarity)
#APP
vpxor ymm6, ymm6, ymm6
#NO_APP
ret
As you can see XMM6 is not preserved even though it is clobbered by vpxor
instruction.
If I pass the -mavx2 flag to the compiler, however
clang-cl /O2 -mavx2 /FA -c test.cpp
the produced assembly turns into
sub rsp, 24
vmovaps xmmword ptr [rsp], xmm6 # 16-byte Spill
#APP
vpxor ymm6, ymm6, ymm6
#NO_APP
vmovaps xmm6, xmmword ptr [rsp] # 16-byte Reload
add rsp, 24
vzeroupper
ret
XMM6 is now preserved.
The same issue is present on Linux and Mac OS. However ms_abi must be
explicitly stated now:
void __attribute__((ms_abi)) test(void)
{
__asm
{
VPXOR YMM6, YMM6, YMM6
}
}
Compiling on Linux with
clang -O2 -fasm-blocks -S test.cpp
produces
#APP
vpxor %ymm6, %ymm6, %ymm6
#NO_APP
retq
Compiling with
clang -O2 -mavx2 -fasm-blocks -S test.cpp
produces
subq $24, %rsp
vmovaps %xmm6, (%rsp) # 16-byte Spill
#APP
vpxor %ymm6, %ymm6, %ymm6
#NO_APP
vmovaps (%rsp), %xmm6 # 16-byte Reload
addq $24, %rsp
vzeroupper
retq
Compiling on Mac OS with
/System/Volumes/Data/Applications/Xcode_12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
-O2 -fasm-blocks -S test.cpp
produces
pushq %rbp
movq %rsp, %rbp
## InlineAsm Start
vpxor %ymm6, %ymm6, %ymm6
## InlineAsm End
popq %rbp
retq
Compiling with
/System/Volumes/Data/Applications/Xcode_12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
-O2 -mavx2 -fasm-blocks -S test.cpp
produces
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
vmovaps %xmm6, -16(%rbp) ## 16-byte Spill
## InlineAsm Start
vpxor %ymm6, %ymm6, %ymm6
## InlineAsm End
vmovaps -16(%rbp), %xmm6 ## 16-byte Reload
addq $16, %rsp
popq %rbp
vzeroupper
retq
Additional comments and observations.
- The issue only happens with Intel-style assembly blocks. Using gcc-style
inline assembly and explicitly mentioning the registers in the clobber list
produces the correct code.
- The real world code, of course, is much more involved and contains
cpuid-based branches for avx2 and non-avx2 platforms. That means that we must
compile without the -mavx2 switch to support both.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>