[llvm-bugs] [Bug 48033] New: [X86] Poor codegen with STMXCSR/LDMXCSR combo.

via llvm-bugs llvm-bugs at lists.llvm.org
Sat Oct 31 05:23:18 PDT 2020


https://bugs.llvm.org/show_bug.cgi?id=48033

            Bug ID: 48033
           Summary: [X86] Poor codegen with STMXCSR/LDMXCSR combo.
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: andrea.dibiagio at gmail.com
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, pengfei.wang at intel.com,
                    spatel+llvm at rotateright.com

This is a spin-off of bug 48024.

```
void MXCSR_Crash()
{
        const unsigned int PreviousMXCSR = _mm_getcsr();
        _mm_setcsr(PreviousMXCSR & ~0x6000);
}
```
(https://gcc.godbolt.org/z/66cvvq)


Currently generates this:

        stmxcsr -4(%rsp)
        movl    $-24577, %eax   # imm = 0x9FFF
        andl    -4(%rsp), %eax
        movl    %eax, -8(%rsp)
        ldmxcsr -8(%rsp)
        retq


This codegen is sub-optimal. It is as if the compiler tried very hard to keep
alive the stack slot with the original value of MXCSR until the end of the
function.

This is suboptimal because it means that an extra stack slot (rsp - 8) has to
be used for the new value of MXCSR. If instead the original slot was reused,
the compiler could have emitted a MR variant of AND (read-modify-write), and it
would have avoided the use of an extra MOV.

GCC gets this right: the entire sequence is three instructions plus the RET.

        stmxcsr -4(%rsp)
        andl    $-24577, -4(%rsp)
        ldmxcsr -4(%rsp)
        retq



I wonder if this poor codegen has to do with the fact that STMXCSR is defined
as having "unmodeled side-effects". Can it be that somehow that prevents the
compiler from commuting the original ADD and use a RMW variant instead?
Alternatively StackSlotColoring is not doing a good job at merging the two
stack slots. This is just me speculating on what the issue might be in the code
generator.

--

On the plus side, the compiler is smart at taking advantage of the red-zone in
this case. Part of me wasn't expecting to see negative offsets used with RSP.
In this particular case, it makes perfectly sense and it avoids having to emit
an extra SUB (of RSP) at the beginning, plus an extra ADD at the end.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20201031/0969a1ca/attachment.html>


More information about the llvm-bugs mailing list