[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?
Jonathan Smith via llvm-dev
llvm-dev at lists.llvm.org
Tue Oct 27 02:52:21 PDT 2020
Interesting. Thank you.
I'm still curious to know what commit fixed this problem, although it
sounds like it's also a problem with how Solaris is implementing the
ABI.
I suppose it's time for me to go hunting through commits.
On Tue, Oct 27, 2020 at 2:21 AM Wang, Pengfei <pengfei.wang at intel.com> wrote:
>
> Hi Jonathan,
>
> It seems the trunk code solves this problem. https://godbolt.org/z/Y1Wdbj
> I took a look at the x86 ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/tree/hjl/x86/1.1#
> It says "In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point."
> So if the OS follows the ABI, the ESP's value should always be 0xXXXXXXXC when enters to a function, and it turns to be 0xXXXXXXX8 after "push ebp". Which happens to be aligned to 8.
>
> Thanks
> Pengfei
>
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jonathan Smith via llvm-dev
> Sent: Tuesday, October 27, 2020 6:51 AM
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] Possible bug in x86 frame lowering with SSE instructions?
>
> Hello, everyone.
>
> I'm looking for some insight into a bug I encountered while testing some custom IR passes on Solaris (x86) and Linux. I don't know if it's a bug with the x86 backend or the way the frame is set up by Solaris
> -- or if I'm simply doing something I shouldn't be doing. The bug manifests even if I don't run any of my passes, so I'm certain those aren't the issue.
>
> Given the following test C code:
>
> int main(int argc, char **argv) {
> int x[10] = {1,2,3};
> return 0;
> }
>
> I compile it to IR with the following arguments:
>
> clang --target=i386-sun-solaris -S -emit-llvm -Xclang -disable-O0-optnone -x c -c array-test.c -o array-test.ll
>
> This yields the following IR:
>
> target datalayout =
> "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
> target triple = "i386-sun-solaris"
>
> ; Function Attrs: noinline nounwind
> define dso_local i32 @main(i32 %0, i8** %1) #0 {
> %3 = alloca i32, align 4
> %4 = alloca i32, align 4
> %5 = alloca i8**, align 4
> %6 = alloca [10 x i32], align 4
> store i32 0, i32* %3, align 4
> store i32 %0, i32* %4, align 4
> store i8** %1, i8*** %5, align 4
> %7 = bitcast [10 x i32]* %6 to i8*
> call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
> %8 = bitcast i8* %7 to [10 x i32]*
> %9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
> store i32 1, i32* %9, align 4
> %10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
> store i32 2, i32* %10, align 4
> %11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
> store i32 3, i32* %11, align 4
> ret i32 0
> }
>
> ; Function Attrs: argmemonly nounwind willreturn writeonly
> declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #1
>
> attributes #0 = { noinline nounwind
> "correctly-rounded-divide-sqrt-fp-math"="false"
> "disable-tail-calls"="false" "frame-pointer"="all"
> "less-precise-fpmad"="false" "min-legal-vector-width"="0"
> "no-infs-fp-math"="false" "no-jump-tables"="false"
> "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false"
> "no-trapping-math"="true" "stack-protector-buffer-size"="8"
> "target-cpu"="pentium4"
> "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
> "unsafe-fp-math"="false" "use-soft-float"="false" }
> attributes #1 = { argmemonly nounwind willreturn writeonly }
>
> Normally, I would run custom passes at this point via opt. But the error I'm getting occurs with or without this step.
>
> Without changing anything else, I run this IR through llc with the following arguments:
>
> llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s
>
> This results in the following assembly:
>
> .text
> .intel_syntax noprefix
> .file "/home/user/code/array-test.ll"
> .globl main # -- Begin function main
> .p2align 4, 0x90
> .type main, at function
> main: # @main
> # %bb.0:
> push ebp
> mov ebp, esp
> sub esp, 56
> mov dword ptr [ebp - 4], 0
> xorps xmm0, xmm0
> movaps xmmword ptr [ebp - 56], xmm0
> movaps xmmword ptr [ebp - 40], xmm0
> mov dword ptr [ebp - 20], 0
> mov dword ptr [ebp - 24], 0
> mov dword ptr [ebp - 56], 1
> mov dword ptr [ebp - 52], 2
> mov dword ptr [ebp - 48], 3
> xor eax, eax
> add esp, 56
> pop ebp
> ret
> .Lfunc_end0:
> .size main, .Lfunc_end0-main
> # -- End function
> .ident "clang version 12.0.0 (https://github.com/llvm/llvm-project.git
> 62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
> .section ".note.GNU-stack","", at progbits
>
> Other than target being i386-sun-solaris, this is exact same code generated in both instances if I target i386-pc-linux-gnu.
>
> If I run this on Linux (Ubuntu 18.04 in this case), there are no problems. If I run this on Solaris, however, a segfault occurs on the first `movaps` instruction. I believe the issue is because the stack is 4-byte aligned on Solaris whereas it's 8-bit aligned on Linux, so the 56- and 40-byte offsets for the array stores just happen to work on Linux -- while they end up being 8 bytes off on Solaris.
>
> Running llc with --stackrealign fixes the problem:
>
> main: # @main
> # %bb.0:
> push ebp
> mov ebp, esp
> and esp, -16
> sub esp, 64
> mov dword ptr [esp + 12], 0
> xorps xmm0, xmm0
> movaps xmmword ptr [esp + 16], xmm0
> movaps xmmword ptr [esp + 32], xmm0
> mov dword ptr [esp + 52], 0
> mov dword ptr [esp + 48], 0
> mov dword ptr [esp + 16], 1
> mov dword ptr [esp + 20], 2
> mov dword ptr [esp + 24], 3
> xor eax, eax
> mov esp, ebp
> pop ebp
> ret
>
> Running clang with -fomit-frame-pointer also fixes the problem, but I have no idea why. Adding --stack-alignment=16 does *not* fix the problem. If I explicitly add the -O0 flag to llc, the `X86TargetLowering::getOptimalMemOpType()` function doesn't lower the array stores to `movaps`:
>
> main: # @main
> # %bb.0:
> push ebp
> mov ebp, esp
> push esi
> sub esp, 68
> mov eax, dword ptr [ebp + 12]
> mov ecx, dword ptr [ebp + 8]
> xor edx, edx
> mov dword ptr [ebp - 8], 0
> lea esi, [ebp - 48]
> mov dword ptr [esp], esi
> mov dword ptr [esp + 4], 0
> mov dword ptr [esp + 8], 40
> mov dword ptr [ebp - 52], eax # 4-byte Spill
> mov dword ptr [ebp - 56], ecx # 4-byte Spill
> mov dword ptr [ebp - 60], edx # 4-byte Spill
> call memset
> mov dword ptr [ebp - 48], 1
> mov dword ptr [ebp - 44], 2
> mov dword ptr [ebp - 40], 3
> mov eax, dword ptr [ebp - 60] # 4-byte Reload
> add esp, 68
> pop esi
> pop ebp
> ret
>
> I've spent the better part of ten hours trying to debug the X86 backend code (and I am, admittedly, not the best at knowing where to look). I determined the `X86FrameLowering::emitPrologue()` function will *only* emit the proper offset adjustment if `X86RegisterInfo::needsStackRealignment()` returns `true`, and the only thing that seems to force it to return `true` is if --stackrealign is used (which sets the "stackrealign" function attribute on `main`).
>
> I don't know if this is truly a bug in the X86 backend (an assumption about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or if this is a result of me using -disable-O0-optnone in Clang without
> -O0 in llc.
>
> Any insight would be helpful, and thanks for reading my rather verbose message.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list