[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?
Wang, Pengfei via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 26 23:21:00 PDT 2020
Hi Jonathan,
It seems the trunk code solves this problem. https://godbolt.org/z/Y1Wdbj
I took a look at the x86 ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/tree/hjl/x86/1.1#
It says "In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point."
So if the OS follows the ABI, the ESP's value should always be 0xXXXXXXXC when enters to a function, and it turns to be 0xXXXXXXX8 after "push ebp". Which happens to be aligned to 8.
Thanks
Pengfei
-----Original Message-----
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jonathan Smith via llvm-dev
Sent: Tuesday, October 27, 2020 6:51 AM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Possible bug in x86 frame lowering with SSE instructions?
Hello, everyone.
I'm looking for some insight into a bug I encountered while testing some custom IR passes on Solaris (x86) and Linux. I don't know if it's a bug with the x86 backend or the way the frame is set up by Solaris
-- or if I'm simply doing something I shouldn't be doing. The bug manifests even if I don't run any of my passes, so I'm certain those aren't the issue.
Given the following test C code:
int main(int argc, char **argv) {
int x[10] = {1,2,3};
return 0;
}
I compile it to IR with the following arguments:
clang --target=i386-sun-solaris -S -emit-llvm -Xclang -disable-O0-optnone -x c -c array-test.c -o array-test.ll
This yields the following IR:
target datalayout =
"e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-sun-solaris"
; Function Attrs: noinline nounwind
define dso_local i32 @main(i32 %0, i8** %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 4
%6 = alloca [10 x i32], align 4
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 4
%7 = bitcast [10 x i32]* %6 to i8*
call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
%8 = bitcast i8* %7 to [10 x i32]*
%9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
store i32 1, i32* %9, align 4
%10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
store i32 2, i32* %10, align 4
%11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
store i32 3, i32* %11, align 4
ret i32 0
}
; Function Attrs: argmemonly nounwind willreturn writeonly
declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #1
attributes #0 = { noinline nounwind
"correctly-rounded-divide-sqrt-fp-math"="false"
"disable-tail-calls"="false" "frame-pointer"="all"
"less-precise-fpmad"="false" "min-legal-vector-width"="0"
"no-infs-fp-math"="false" "no-jump-tables"="false"
"no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false"
"no-trapping-math"="true" "stack-protector-buffer-size"="8"
"target-cpu"="pentium4"
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
"unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind willreturn writeonly }
Normally, I would run custom passes at this point via opt. But the error I'm getting occurs with or without this step.
Without changing anything else, I run this IR through llc with the following arguments:
llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s
This results in the following assembly:
.text
.intel_syntax noprefix
.file "/home/user/code/array-test.ll"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main, at function
main: # @main
# %bb.0:
push ebp
mov ebp, esp
sub esp, 56
mov dword ptr [ebp - 4], 0
xorps xmm0, xmm0
movaps xmmword ptr [ebp - 56], xmm0
movaps xmmword ptr [ebp - 40], xmm0
mov dword ptr [ebp - 20], 0
mov dword ptr [ebp - 24], 0
mov dword ptr [ebp - 56], 1
mov dword ptr [ebp - 52], 2
mov dword ptr [ebp - 48], 3
xor eax, eax
add esp, 56
pop ebp
ret
.Lfunc_end0:
.size main, .Lfunc_end0-main
# -- End function
.ident "clang version 12.0.0 (https://github.com/llvm/llvm-project.git
62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
.section ".note.GNU-stack","", at progbits
Other than target being i386-sun-solaris, this is exact same code generated in both instances if I target i386-pc-linux-gnu.
If I run this on Linux (Ubuntu 18.04 in this case), there are no problems. If I run this on Solaris, however, a segfault occurs on the first `movaps` instruction. I believe the issue is because the stack is 4-byte aligned on Solaris whereas it's 8-bit aligned on Linux, so the 56- and 40-byte offsets for the array stores just happen to work on Linux -- while they end up being 8 bytes off on Solaris.
Running llc with --stackrealign fixes the problem:
main: # @main
# %bb.0:
push ebp
mov ebp, esp
and esp, -16
sub esp, 64
mov dword ptr [esp + 12], 0
xorps xmm0, xmm0
movaps xmmword ptr [esp + 16], xmm0
movaps xmmword ptr [esp + 32], xmm0
mov dword ptr [esp + 52], 0
mov dword ptr [esp + 48], 0
mov dword ptr [esp + 16], 1
mov dword ptr [esp + 20], 2
mov dword ptr [esp + 24], 3
xor eax, eax
mov esp, ebp
pop ebp
ret
Running clang with -fomit-frame-pointer also fixes the problem, but I have no idea why. Adding --stack-alignment=16 does *not* fix the problem. If I explicitly add the -O0 flag to llc, the `X86TargetLowering::getOptimalMemOpType()` function doesn't lower the array stores to `movaps`:
main: # @main
# %bb.0:
push ebp
mov ebp, esp
push esi
sub esp, 68
mov eax, dword ptr [ebp + 12]
mov ecx, dword ptr [ebp + 8]
xor edx, edx
mov dword ptr [ebp - 8], 0
lea esi, [ebp - 48]
mov dword ptr [esp], esi
mov dword ptr [esp + 4], 0
mov dword ptr [esp + 8], 40
mov dword ptr [ebp - 52], eax # 4-byte Spill
mov dword ptr [ebp - 56], ecx # 4-byte Spill
mov dword ptr [ebp - 60], edx # 4-byte Spill
call memset
mov dword ptr [ebp - 48], 1
mov dword ptr [ebp - 44], 2
mov dword ptr [ebp - 40], 3
mov eax, dword ptr [ebp - 60] # 4-byte Reload
add esp, 68
pop esi
pop ebp
ret
I've spent the better part of ten hours trying to debug the X86 backend code (and I am, admittedly, not the best at knowing where to look). I determined the `X86FrameLowering::emitPrologue()` function will *only* emit the proper offset adjustment if `X86RegisterInfo::needsStackRealignment()` returns `true`, and the only thing that seems to force it to return `true` is if --stackrealign is used (which sets the "stackrealign" function attribute on `main`).
I don't know if this is truly a bug in the X86 backend (an assumption about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or if this is a result of me using -disable-O0-optnone in Clang without
-O0 in llc.
Any insight would be helpful, and thanks for reading my rather verbose message.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list