[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Tue Oct 27 02:52:21 PDT 2020

Interesting. Thank you.

I'm still curious to know what commit fixed this problem, although it
sounds like it's also a problem with how Solaris is implementing the
ABI.

I suppose it's time for me to go hunting through commits.

On Tue, Oct 27, 2020 at 2:21 AM Wang, Pengfei <pengfei.wang at intel.com> wrote:
>
> Hi Jonathan,
>
> It seems the trunk code solves this problem. https://godbolt.org/z/Y1Wdbj
> I took a look at the x86 ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/tree/hjl/x86/1.1#
> It says "In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point."
> So if the OS follows the ABI, the ESP's value should always be 0xXXXXXXXC when enters to a function, and it turns to be 0xXXXXXXX8 after "push ebp". Which happens to be aligned to 8.
>
> Thanks
> Pengfei
>
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jonathan Smith via llvm-dev
> Sent: Tuesday, October 27, 2020 6:51 AM
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] Possible bug in x86 frame lowering with SSE instructions?
>
> Hello, everyone.
>
> I'm looking for some insight into a bug I encountered while testing some custom IR passes on Solaris (x86) and Linux. I don't know if it's a bug with the x86 backend or the way the frame is set up by Solaris
> -- or if I'm simply doing something I shouldn't be doing. The bug manifests even if I don't run any of my passes, so I'm certain those aren't the issue.
>
> Given the following test C code:
>
>     int main(int argc, char **argv) {
>       int x[10] = {1,2,3};
>       return 0;
>     }
>
> I compile it to IR with the following arguments:
>
>   clang --target=i386-sun-solaris -S -emit-llvm -Xclang -disable-O0-optnone -x c -c array-test.c -o array-test.ll
>
> This yields the following IR:
>
>     target datalayout =
> "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
>     target triple = "i386-sun-solaris"
>
>     ; Function Attrs: noinline nounwind
>     define dso_local i32 @main(i32 %0, i8** %1) #0 {
>       %3 = alloca i32, align 4
>       %4 = alloca i32, align 4
>       %5 = alloca i8**, align 4
>       %6 = alloca [10 x i32], align 4
>       store i32 0, i32* %3, align 4
>       store i32 %0, i32* %4, align 4
>       store i8** %1, i8*** %5, align 4
>       %7 = bitcast [10 x i32]* %6 to i8*
>       call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
>       %8 = bitcast i8* %7 to [10 x i32]*
>       %9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
>       store i32 1, i32* %9, align 4
>       %10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
>       store i32 2, i32* %10, align 4
>       %11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
>       store i32 3, i32* %11, align 4
>       ret i32 0
>     }
>
>     ; Function Attrs: argmemonly nounwind willreturn writeonly
>     declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #1
>
>     attributes #0 = { noinline nounwind
> "correctly-rounded-divide-sqrt-fp-math"="false"
> "disable-tail-calls"="false" "frame-pointer"="all"
> "less-precise-fpmad"="false" "min-legal-vector-width"="0"
> "no-infs-fp-math"="false" "no-jump-tables"="false"
> "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false"
> "no-trapping-math"="true" "stack-protector-buffer-size"="8"
> "target-cpu"="pentium4"
> "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
> "unsafe-fp-math"="false" "use-soft-float"="false" }
>     attributes #1 = { argmemonly nounwind willreturn writeonly }
>
> Normally, I would run custom passes at this point via opt. But the error I'm getting occurs with or without this step.
>
> Without changing anything else, I run this IR through llc with the following arguments:
>
>     llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s
>
> This results in the following assembly:
>
>             .text
>             .intel_syntax noprefix
>             .file   "/home/user/code/array-test.ll"
>             .globl  main                            # -- Begin function main
>             .p2align        4, 0x90
>             .type   main, at function
>     main:                                   # @main
>     # %bb.0:
>             push    ebp
>             mov     ebp, esp
>             sub     esp, 56
>             mov     dword ptr [ebp - 4], 0
>             xorps   xmm0, xmm0
>             movaps  xmmword ptr [ebp - 56], xmm0
>             movaps  xmmword ptr [ebp - 40], xmm0
>             mov     dword ptr [ebp - 20], 0
>             mov     dword ptr [ebp - 24], 0
>             mov     dword ptr [ebp - 56], 1
>             mov     dword ptr [ebp - 52], 2
>             mov     dword ptr [ebp - 48], 3
>             xor     eax, eax
>             add     esp, 56
>             pop     ebp
>             ret
>     .Lfunc_end0:
>             .size   main, .Lfunc_end0-main
>                                             # -- End function
>             .ident  "clang version 12.0.0 (https://github.com/llvm/llvm-project.git
> 62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
>             .section        ".note.GNU-stack","", at progbits
>
> Other than target being i386-sun-solaris, this is  exact same code generated in both instances if I target i386-pc-linux-gnu.
>
> If I run this on Linux (Ubuntu 18.04 in this case), there are no problems. If I run this on Solaris, however, a segfault occurs on the first `movaps` instruction. I believe the issue is because the stack is 4-byte aligned on Solaris whereas it's 8-bit aligned on Linux, so the 56- and 40-byte offsets for the array stores just happen to work on Linux -- while they end up being 8 bytes off on Solaris.
>
> Running llc with --stackrealign fixes the problem:
>
>     main:                                   # @main
>     # %bb.0:
>             push    ebp
>             mov     ebp, esp
>             and     esp, -16
>             sub     esp, 64
>             mov     dword ptr [esp + 12], 0
>             xorps   xmm0, xmm0
>             movaps  xmmword ptr [esp + 16], xmm0
>             movaps  xmmword ptr [esp + 32], xmm0
>             mov     dword ptr [esp + 52], 0
>             mov     dword ptr [esp + 48], 0
>             mov     dword ptr [esp + 16], 1
>             mov     dword ptr [esp + 20], 2
>             mov     dword ptr [esp + 24], 3
>             xor     eax, eax
>             mov     esp, ebp
>             pop     ebp
>             ret
>
> Running clang with -fomit-frame-pointer also fixes the problem, but I have no idea why. Adding --stack-alignment=16 does *not* fix the problem. If I explicitly add the -O0 flag to llc, the `X86TargetLowering::getOptimalMemOpType()` function doesn't lower the array stores to `movaps`:
>
>     main:                                   # @main
>     # %bb.0:
>             push    ebp
>             mov     ebp, esp
>             push    esi
>             sub     esp, 68
>             mov     eax, dword ptr [ebp + 12]
>             mov     ecx, dword ptr [ebp + 8]
>             xor     edx, edx
>             mov     dword ptr [ebp - 8], 0
>             lea     esi, [ebp - 48]
>             mov     dword ptr [esp], esi
>             mov     dword ptr [esp + 4], 0
>             mov     dword ptr [esp + 8], 40
>             mov     dword ptr [ebp - 52], eax       # 4-byte Spill
>             mov     dword ptr [ebp - 56], ecx       # 4-byte Spill
>             mov     dword ptr [ebp - 60], edx       # 4-byte Spill
>             call    memset
>             mov     dword ptr [ebp - 48], 1
>             mov     dword ptr [ebp - 44], 2
>             mov     dword ptr [ebp - 40], 3
>             mov     eax, dword ptr [ebp - 60]       # 4-byte Reload
>             add     esp, 68
>             pop     esi
>             pop     ebp
>             ret
>
> I've spent the better part of ten hours trying to debug the X86 backend code (and I am, admittedly, not the best at knowing where to look). I determined the `X86FrameLowering::emitPrologue()` function will *only* emit the proper offset adjustment if `X86RegisterInfo::needsStackRealignment()` returns `true`, and the only thing that seems to force it to return `true` is if --stackrealign is used (which sets the "stackrealign" function attribute on `main`).
>
> I don't know if this is truly a bug in the X86 backend (an assumption about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or if this is a result of me using -disable-O0-optnone in Clang without
> -O0 in llc.
>
> Any insight would be helpful, and thanks for reading my rather verbose message.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev