[llvm-bugs] [Bug 51222] New: Clobbered XMM registers are not preserved around Intel-style inline assembly blocks in MS-ABI functions

Mon Jul 26 18:23:42 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=51222

            Bug ID: 51222
           Summary: Clobbered XMM registers are not preserved around
                    Intel-style inline assembly blocks in MS-ABI functions
           Product: new-bugs
           Version: 12.0
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: skoulik at gmail.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

The issue is first observed with clang 10.0 bundled with MS Visual Studio 2019
on windows, but later confirmed with clang 7.0.1 on Linux (CentOS 7.7) and with
clang 12.0 bundled with Xcode 12.2 on Mac OS.

Here is the minimal reproducible example:
    void test(void)
    {
        __asm
        {
            VPXOR YMM6, YMM6, YMM6
        }
    }

When compiled on windows with 
    clang-cl /O2 /FA -c test.cpp

it produces the following assembly (meta-information skipped for clarity)
        #APP
        vpxor   ymm6, ymm6, ymm6
        #NO_APP
        ret

As you can see XMM6 is not preserved even though it is clobbered by vpxor
instruction.

If I pass the -mavx2 flag to the compiler, however
    clang-cl /O2 -mavx2 /FA -c test.cpp

the produced assembly turns into
        sub     rsp, 24
        vmovaps xmmword ptr [rsp], xmm6 # 16-byte Spill
        #APP
        vpxor   ymm6, ymm6, ymm6
        #NO_APP
        vmovaps xmm6, xmmword ptr [rsp] # 16-byte Reload
        add     rsp, 24
        vzeroupper
        ret

XMM6 is now preserved.

The same issue is present on Linux and Mac OS. However ms_abi must be
explicitly stated now:
    void __attribute__((ms_abi)) test(void)
    {
        __asm
        {
            VPXOR YMM6, YMM6, YMM6
        }
    }

Compiling on Linux with
    clang -O2 -fasm-blocks -S test.cpp

produces
        #APP
        vpxor   %ymm6, %ymm6, %ymm6
        #NO_APP
        retq

Compiling with
    clang -O2 -mavx2 -fasm-blocks -S test.cpp

produces
        subq    $24, %rsp
        vmovaps %xmm6, (%rsp)           # 16-byte Spill
        #APP
        vpxor   %ymm6, %ymm6, %ymm6
        #NO_APP
        vmovaps (%rsp), %xmm6           # 16-byte Reload
        addq    $24, %rsp
        vzeroupper
        retq

Compiling on Mac OS with

/System/Volumes/Data/Applications/Xcode_12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
-O2 -fasm-blocks -S test.cpp

produces
        pushq   %rbp
        movq    %rsp, %rbp
        ## InlineAsm Start
        vpxor   %ymm6, %ymm6, %ymm6
        ## InlineAsm End
        popq    %rbp
        retq

Compiling with

/System/Volumes/Data/Applications/Xcode_12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
-O2 -mavx2 -fasm-blocks -S test.cpp

produces
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $16, %rsp
        vmovaps %xmm6, -16(%rbp)        ## 16-byte Spill
        ## InlineAsm Start
        vpxor   %ymm6, %ymm6, %ymm6
        ## InlineAsm End
        vmovaps -16(%rbp), %xmm6        ## 16-byte Reload
        addq    $16, %rsp
        popq    %rbp
        vzeroupper
        retq

Additional comments and observations.
    - The issue only happens with Intel-style assembly blocks. Using gcc-style
inline assembly and explicitly mentioning the registers in the clobber list
produces the correct code.
    - The real world code, of course, is much more involved and contains
cpuid-based branches for avx2 and non-avx2 platforms. That means that we must
compile without the -mavx2 switch to support both.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210727/e68d3d87/attachment.html>