[llvm-bugs] [Bug 41844] New: attribute((interrupt)) Handlers Dangerously Out of Spec on x86-64

Sat May 11 12:47:46 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=41844

            Bug ID: 41844
           Summary: __attribute__((interrupt)) Handlers Dangerously Out of
                    Spec on x86-64
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: contact at knnspeed.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

Sorry, it's a long one. Although this issue actually gives rise to Bug 26413.

When compiling an x86-64 interrupt handler using __attribute__((interrupt)),
Clang/LLVM generates some potentially very dangerous assembly that could cause
random failures or incorrect results when using x86-64 instructions like
SSE/AVX.

In the x86-64 ISA, a special instruction called XSAVE is required to save the
state of AVX/SSE registers, and SSE has a similar one called FXSAVE. For
restoring the state, there is XRSTOR for AVX/SSE and FXRSTOR for SSE. This is
the correct way to save and restore ISA extension register state--not moving
the AVX/SSE registers onto the stack, which is what Clang/LLVM currently does
(and has done since 3.9.0, according to assembly readouts on gcc.godbolt.org).

You can see the behavior by compiling the following code on gcc.godbolt.org,
using any version of Clang since 3.9.0 with any of the -mavx, -mavx2, -msse,
etc. flags:

// start of code

#include <stdint.h>

static void some_function(uint64_t * pointer)
{
  pointer++;
}

__attribute__((interrupt)) void handler(uint64_t * some_frame)
{
 some_function(some_frame);
}

// end of code

Here's where the major problem lies (actually there are a few):

1) Significantly, it looks like certain CPUs trash YMM2/XMM2 when moved with an
AVX instruction during an interrupt. You can see this in YMM2.PNG (attached,
see Note 1 for corrective info about the data shown in the image), which
showcases a divide-by-zero interrupt that uses vmovdqu to move YMM registers
onto the stack. 

This is the C code I used to trigger the interrupt: note that there is nothing
inbetween the asm statements and the forced division error. The divide-by-zero
interrupt handler just pushes the general registers on top of the interrupt
frame, subtracts the stack pointer to account for the size of the AVX
registers, and uses vmovdqu to move the AVX registers into the stack memory
area:

// Start of code to trigger interrupt in YMM2.PNG

  __m256i_u whaty = _mm256_set1_epi32(0x17);
  asm volatile("vmovdqu %[what], %%ymm1" : : [what] "m" (whaty) :);
  asm volatile("vmovdqu %[what], %%ymm2" : : [what] "m" (whaty) :); // Odd
behavior with YMM2. The rest are fine.
  asm volatile("vmovdqu %[what], %%ymm3" : : [what] "m" (whaty) :); 
  asm volatile("vmovdqu %[what], %%ymm4" : : [what] "m" (whaty) :); 
  asm volatile("vmovdqu %[what], %%ymm5" : : [what] "m" (whaty) :);
  asm volatile("vmovdqu %[what], %%ymm6" : : [what] "m" (whaty) :);
  asm volatile("vmovdqu %[what], %%ymm7" : : [what] "m" (whaty) :);

  volatile uint64_t c = cs / (cs >> 10); // cs is just a value that will
guarantee a divide by zero error

// End of code to trigger interrupt in YMM2.PNG

YMM2_XSAVE.PNG (also attached) shows correct behavior, and it is the same
readout using XSAVE to store the AVX registers to a memory area not on the
stack. The data is different because I was testing something else, so the C
code I used to trigger the interrupt changed to this (basically it's just the
values being stored that have changed):

// Start of code to trigger interrupt in YMM2_XSAVE.PNG

  __m256i_u whaty = _mm256_set1_epi32(0x17181920);
  __m256i_u what2 = _mm256_set1_epi64x(0x1718192011223344);
  __m256i_u what3 = _mm256_set1_epi32(0x18);
  __m256i_u what9 = _mm256_set1_epi32(0x180019);

  asm volatile("vmovdqu %[what], %%ymm1" : : [what] "m" (whaty) :);
  asm volatile("vmovdqu %[what], %%ymm2" : : [what] "m" (what2) :); 
  asm volatile("vmovdqu %[what], %%ymm3" : : [what] "m" (what3) :); 
  asm volatile("vmovdqu %[what], %%ymm4" : : [what] "m" (whaty) :); 
  asm volatile("vmovdqu %[what], %%ymm5" : : [what] "m" (whaty) :);
  asm volatile("vmovdqu %[what], %%ymm6" : : [what] "m" (whaty) :);
  asm volatile("vmovdqu %[what], %%ymm7" : : [what] "m" (whaty) :);

  asm volatile("vmovdqu %[what], %%ymm15" : : [what] "m" (what9) :);

  volatile __m256i output = _mm256_bsrli_epi128(what2, 1); // To verify the
quadword order is correct

  volatile uint64_t c = cs / (cs >> 10); // cs is just a value that will
guarantee a divide by zero error

// End of code to trigger interrupt in YMM2_XSAVE.PNG

As you can see by the differences, the CPU (an i7-7700HQ) is doing something
weird in YMM2 in the first case. I have checked through the code involved and
saw nothing in the program's output assembly that modified YMM2 in a way that
would cause this behavior--in fact YMM2 doesn't get touched at any point
between the move-to-stack and the print. Even if it turns out to be a deep
semantic bug in the program (not likely, see Note 2), the stack is not a safe
place for the AVX registers to reside, particularly when XSAVE exists for this
very purpose. It should be evident here that restoring the registers from such
a malformed state could have catastrophic impacts in programs that are using
AVX.

2) Clang/LLVM outputs "movaps/vmovaps" to move the registers onto the stack.
This is problematic because there is no mechanism to differentiate between
interrupts and exceptions, which are offset by 8 bytes. The alignment trick
with "and $-32, %rsp" causes dead space on the stack. This breaks any attempt
to read the stack registers using a struct, since the size can't be guaranteed.
This is essentially the crux of Bug 26413.

3) This could be avoided if Clang/LLVM honored the -mgeneral-regs-only flag,
but it doesn't. At least, not when I add it to the compile line (I'm using
gcc.godbolt.org to check), where it seems to be ignored. That means this could
really bite people hard.

Seeing how this has been around since 3.9.0 and I'm apparently the first one to
report this, I suppose it's not "release blocking" in severity, but hopefully
I'm not the only one disconcerted by this.

Quick notes about the images:
Note 1: I had the quadword order backwards in YMM2.PNG, so the "6" is actually
in the most significant quadword, and what looks like an address is actually in
the least significant quadword. The "address" is pointing to somewhere in
EfiLoaderData, which is where the executing program resides (more reason to
believe it's an address). The quadword order is corrected in YMM2_XSAVE.PNG.

Note 2: Not likely to be a semantic bug since it works fine with XSAVE, and it
also happens in XMM2 with movdqu in exactly the same way if using only SSE with
AVX enabled: "address" and random "6" in the least and most significant
quadwords of XMM2 instead of YMM2. More reason to believe the CPU is doing
something weird here.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190511/e69f6dd1/attachment.html>

[llvm-bugs] [Bug 41844] New: __attribute__((interrupt)) Handlers Dangerously Out of Spec on x86-64

[llvm-bugs] [Bug 41844] New: attribute((interrupt)) Handlers Dangerously Out of Spec on x86-64