[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?
Jonathan Smith via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 26 15:51:11 PDT 2020
Hello, everyone.
I'm looking for some insight into a bug I encountered while testing
some custom IR passes on Solaris (x86) and Linux. I don't know if it's
a bug with the x86 backend or the way the frame is set up by Solaris
-- or if I'm simply doing something I shouldn't be doing. The bug
manifests even if I don't run any of my passes, so I'm certain those
aren't the issue.
Given the following test C code:
int main(int argc, char **argv) {
int x[10] = {1,2,3};
return 0;
}
I compile it to IR with the following arguments:
clang --target=i386-sun-solaris -S -emit-llvm -Xclang
-disable-O0-optnone -x c -c array-test.c -o array-test.ll
This yields the following IR:
target datalayout =
"e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-sun-solaris"
; Function Attrs: noinline nounwind
define dso_local i32 @main(i32 %0, i8** %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 4
%6 = alloca [10 x i32], align 4
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 4
%7 = bitcast [10 x i32]* %6 to i8*
call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
%8 = bitcast i8* %7 to [10 x i32]*
%9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
store i32 1, i32* %9, align 4
%10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
store i32 2, i32* %10, align 4
%11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
store i32 3, i32* %11, align 4
ret i32 0
}
; Function Attrs: argmemonly nounwind willreturn writeonly
declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8,
i32, i1 immarg) #1
attributes #0 = { noinline nounwind
"correctly-rounded-divide-sqrt-fp-math"="false"
"disable-tail-calls"="false" "frame-pointer"="all"
"less-precise-fpmad"="false" "min-legal-vector-width"="0"
"no-infs-fp-math"="false" "no-jump-tables"="false"
"no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false"
"no-trapping-math"="true" "stack-protector-buffer-size"="8"
"target-cpu"="pentium4"
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
"unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind willreturn writeonly }
Normally, I would run custom passes at this point via opt. But the
error I'm getting occurs with or without this step.
Without changing anything else, I run this IR through llc with the
following arguments:
llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s
This results in the following assembly:
.text
.intel_syntax noprefix
.file "/home/user/code/array-test.ll"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main, at function
main: # @main
# %bb.0:
push ebp
mov ebp, esp
sub esp, 56
mov dword ptr [ebp - 4], 0
xorps xmm0, xmm0
movaps xmmword ptr [ebp - 56], xmm0
movaps xmmword ptr [ebp - 40], xmm0
mov dword ptr [ebp - 20], 0
mov dword ptr [ebp - 24], 0
mov dword ptr [ebp - 56], 1
mov dword ptr [ebp - 52], 2
mov dword ptr [ebp - 48], 3
xor eax, eax
add esp, 56
pop ebp
ret
.Lfunc_end0:
.size main, .Lfunc_end0-main
# -- End function
.ident "clang version 12.0.0
(https://github.com/llvm/llvm-project.git
62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
.section ".note.GNU-stack","", at progbits
Other than target being i386-sun-solaris, this is exact same code
generated in both instances if I target i386-pc-linux-gnu.
If I run this on Linux (Ubuntu 18.04 in this case), there are no
problems. If I run this on Solaris, however, a segfault occurs on the
first `movaps` instruction. I believe the issue is because the stack
is 4-byte aligned on Solaris whereas it's 8-bit aligned on Linux, so
the 56- and 40-byte offsets for the array stores just happen to work
on Linux -- while they end up being 8 bytes off on Solaris.
Running llc with --stackrealign fixes the problem:
main: # @main
# %bb.0:
push ebp
mov ebp, esp
and esp, -16
sub esp, 64
mov dword ptr [esp + 12], 0
xorps xmm0, xmm0
movaps xmmword ptr [esp + 16], xmm0
movaps xmmword ptr [esp + 32], xmm0
mov dword ptr [esp + 52], 0
mov dword ptr [esp + 48], 0
mov dword ptr [esp + 16], 1
mov dword ptr [esp + 20], 2
mov dword ptr [esp + 24], 3
xor eax, eax
mov esp, ebp
pop ebp
ret
Running clang with -fomit-frame-pointer also fixes the problem, but I
have no idea why. Adding --stack-alignment=16 does *not* fix the
problem. If I explicitly add the -O0 flag to llc, the
`X86TargetLowering::getOptimalMemOpType()` function doesn't lower the
array stores to `movaps`:
main: # @main
# %bb.0:
push ebp
mov ebp, esp
push esi
sub esp, 68
mov eax, dword ptr [ebp + 12]
mov ecx, dword ptr [ebp + 8]
xor edx, edx
mov dword ptr [ebp - 8], 0
lea esi, [ebp - 48]
mov dword ptr [esp], esi
mov dword ptr [esp + 4], 0
mov dword ptr [esp + 8], 40
mov dword ptr [ebp - 52], eax # 4-byte Spill
mov dword ptr [ebp - 56], ecx # 4-byte Spill
mov dword ptr [ebp - 60], edx # 4-byte Spill
call memset
mov dword ptr [ebp - 48], 1
mov dword ptr [ebp - 44], 2
mov dword ptr [ebp - 40], 3
mov eax, dword ptr [ebp - 60] # 4-byte Reload
add esp, 68
pop esi
pop ebp
ret
I've spent the better part of ten hours trying to debug the X86
backend code (and I am, admittedly, not the best at knowing where to
look). I determined the `X86FrameLowering::emitPrologue()` function
will *only* emit the proper offset adjustment if
`X86RegisterInfo::needsStackRealignment()` returns `true`, and the
only thing that seems to force it to return `true` is if
--stackrealign is used (which sets the "stackrealign" function
attribute on `main`).
I don't know if this is truly a bug in the X86 backend (an assumption
about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or
if this is a result of me using -disable-O0-optnone in Clang without
-O0 in llc.
Any insight would be helpful, and thanks for reading my rather verbose message.
More information about the llvm-dev
mailing list