[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Mon Oct 26 23:21:00 PDT 2020

Hi Jonathan,

It seems the trunk code solves this problem. https://godbolt.org/z/Y1Wdbj
I took a look at the x86 ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/tree/hjl/x86/1.1#
It says "In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point."
So if the OS follows the ABI, the ESP's value should always be 0xXXXXXXXC when enters to a function, and it turns to be 0xXXXXXXX8 after "push ebp". Which happens to be aligned to 8.

Thanks
Pengfei

-----Original Message-----
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jonathan Smith via llvm-dev
Sent: Tuesday, October 27, 2020 6:51 AM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Hello, everyone.

I'm looking for some insight into a bug I encountered while testing some custom IR passes on Solaris (x86) and Linux. I don't know if it's a bug with the x86 backend or the way the frame is set up by Solaris
-- or if I'm simply doing something I shouldn't be doing. The bug manifests even if I don't run any of my passes, so I'm certain those aren't the issue.

Given the following test C code:

    int main(int argc, char **argv) {
      int x[10] = {1,2,3};
      return 0;
    }

I compile it to IR with the following arguments:

  clang --target=i386-sun-solaris -S -emit-llvm -Xclang -disable-O0-optnone -x c -c array-test.c -o array-test.ll

This yields the following IR:

    target datalayout =
"e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
    target triple = "i386-sun-solaris"

    ; Function Attrs: noinline nounwind
    define dso_local i32 @main(i32 %0, i8** %1) #0 {
      %3 = alloca i32, align 4
      %4 = alloca i32, align 4
      %5 = alloca i8**, align 4
      %6 = alloca [10 x i32], align 4
      store i32 0, i32* %3, align 4
      store i32 %0, i32* %4, align 4
      store i8** %1, i8*** %5, align 4
      %7 = bitcast [10 x i32]* %6 to i8*
      call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
      %8 = bitcast i8* %7 to [10 x i32]*
      %9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
      store i32 1, i32* %9, align 4
      %10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
      store i32 2, i32* %10, align 4
      %11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
      store i32 3, i32* %11, align 4
      ret i32 0
    }

    ; Function Attrs: argmemonly nounwind willreturn writeonly
    declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #1

    attributes #0 = { noinline nounwind
"correctly-rounded-divide-sqrt-fp-math"="false"
"disable-tail-calls"="false" "frame-pointer"="all"
"less-precise-fpmad"="false" "min-legal-vector-width"="0"
"no-infs-fp-math"="false" "no-jump-tables"="false"
"no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false"
"no-trapping-math"="true" "stack-protector-buffer-size"="8"
"target-cpu"="pentium4"
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
"unsafe-fp-math"="false" "use-soft-float"="false" }
    attributes #1 = { argmemonly nounwind willreturn writeonly }

Normally, I would run custom passes at this point via opt. But the error I'm getting occurs with or without this step.

Without changing anything else, I run this IR through llc with the following arguments:

    llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s

This results in the following assembly:

            .text
            .intel_syntax noprefix
            .file   "/home/user/code/array-test.ll"
            .globl  main                            # -- Begin function main
            .p2align        4, 0x90
            .type   main, at function
    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            sub     esp, 56
            mov     dword ptr [ebp - 4], 0
            xorps   xmm0, xmm0
            movaps  xmmword ptr [ebp - 56], xmm0
            movaps  xmmword ptr [ebp - 40], xmm0
            mov     dword ptr [ebp - 20], 0
            mov     dword ptr [ebp - 24], 0
            mov     dword ptr [ebp - 56], 1
            mov     dword ptr [ebp - 52], 2
            mov     dword ptr [ebp - 48], 3
            xor     eax, eax
            add     esp, 56
            pop     ebp
            ret
    .Lfunc_end0:
            .size   main, .Lfunc_end0-main
                                            # -- End function
            .ident  "clang version 12.0.0 (https://github.com/llvm/llvm-project.git
62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
            .section        ".note.GNU-stack","", at progbits

Other than target being i386-sun-solaris, this is  exact same code generated in both instances if I target i386-pc-linux-gnu.

If I run this on Linux (Ubuntu 18.04 in this case), there are no problems. If I run this on Solaris, however, a segfault occurs on the first `movaps` instruction. I believe the issue is because the stack is 4-byte aligned on Solaris whereas it's 8-bit aligned on Linux, so the 56- and 40-byte offsets for the array stores just happen to work on Linux -- while they end up being 8 bytes off on Solaris.

Running llc with --stackrealign fixes the problem:

    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            and     esp, -16
            sub     esp, 64
            mov     dword ptr [esp + 12], 0
            xorps   xmm0, xmm0
            movaps  xmmword ptr [esp + 16], xmm0
            movaps  xmmword ptr [esp + 32], xmm0
            mov     dword ptr [esp + 52], 0
            mov     dword ptr [esp + 48], 0
            mov     dword ptr [esp + 16], 1
            mov     dword ptr [esp + 20], 2
            mov     dword ptr [esp + 24], 3
            xor     eax, eax
            mov     esp, ebp
            pop     ebp
            ret

Running clang with -fomit-frame-pointer also fixes the problem, but I have no idea why. Adding --stack-alignment=16 does *not* fix the problem. If I explicitly add the -O0 flag to llc, the `X86TargetLowering::getOptimalMemOpType()` function doesn't lower the array stores to `movaps`:

    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            push    esi
            sub     esp, 68
            mov     eax, dword ptr [ebp + 12]
            mov     ecx, dword ptr [ebp + 8]
            xor     edx, edx
            mov     dword ptr [ebp - 8], 0
            lea     esi, [ebp - 48]
            mov     dword ptr [esp], esi
            mov     dword ptr [esp + 4], 0
            mov     dword ptr [esp + 8], 40
            mov     dword ptr [ebp - 52], eax       # 4-byte Spill
            mov     dword ptr [ebp - 56], ecx       # 4-byte Spill
            mov     dword ptr [ebp - 60], edx       # 4-byte Spill
            call    memset
            mov     dword ptr [ebp - 48], 1
            mov     dword ptr [ebp - 44], 2
            mov     dword ptr [ebp - 40], 3
            mov     eax, dword ptr [ebp - 60]       # 4-byte Reload
            add     esp, 68
            pop     esi
            pop     ebp
            ret

I've spent the better part of ten hours trying to debug the X86 backend code (and I am, admittedly, not the best at knowing where to look). I determined the `X86FrameLowering::emitPrologue()` function will *only* emit the proper offset adjustment if `X86RegisterInfo::needsStackRealignment()` returns `true`, and the only thing that seems to force it to return `true` is if --stackrealign is used (which sets the "stackrealign" function attribute on `main`).

I don't know if this is truly a bug in the X86 backend (an assumption about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or if this is a result of me using -disable-O0-optnone in Clang without
-O0 in llc.

Any insight would be helpful, and thanks for reading my rather verbose message.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev