[LLVMdev] System call miscompilation using the fast register allocator
Chris Smowton
chris at smowton.net
Mon Oct 21 17:18:44 PDT 2013
Hi,
Apologies this is a bit lengthy. TLDR: I'm using Dragonegg + LLVM 3.2
and uClibc, and am finding that using the Fast register allocator (i.e.
-optimize-regalloc=0) causes miscompilation of setsockopt calls (5-arg
system calls). The problem doesn't happen with the default register
allocation path selected. It can be worked around by manually
simplifying the system call setup sequence. I'm looking to find out
whether this is a bug with the fast register allocator, or whether the
Linux headers' description of the system call setup sequence, or
gcc/Dragonegg's interpretation of such, is faulty and the allocator just
so happens to expose the bug.
Now the long version:
I'm building a simple test program that uses a 5-argument system call
using LLVM 3.2, like:
int main(int argc, char** argv) {
int val = 1;
socklen_t len = 4;
return setsockopt(-1, SOL_SOCKET, TCP_CORK, &val, len);
}
setsockopt is provided by uclibc, and is available as LLVM, leading to
optimised LLVM code like:
define i32 @main(i32 %argc, i8** nocapture %argv) unnamed_addr nounwind
uwtable {
entry:
%val = alloca i32, align 4
store i32 1, i32* %val, align 4
%0 = ptrtoint i32* %val to i64
call void asm sideeffect "", "{r8}"(i64 4) nounwind
call void asm sideeffect "", "{r10}"(i64 %0) nounwind
call void asm sideeffect "", "{rdx}"(i64 3) nounwind
call void asm sideeffect "", "{rsi}"(i64 1) nounwind
call void asm sideeffect "", "{rdi}"(i64 -1) nounwind
%1 = call i64 asm sideeffect "", "={rdi}"() nounwind
%2 = call i64 asm sideeffect "", "={rsi}"() nounwind
%3 = call i64 asm sideeffect "", "={rdx}"() nounwind
%4 = call i64 asm sideeffect "", "={r10}"() nounwind
%5 = call i64 asm sideeffect "", "={r8}"() nounwind
%asmtmp.i = call i64 asm sideeffect "syscall\0A\09",
"={ax},0,{rdi},{rsi},{rdx},{r10},{r8},~{fpsr},~{flags},~{cx},~{r11},~{cc},~{memory}"(i64
54, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5) nounwind, !srcloc !0
SOL_SOCKET = 1 and TCP_CORK = 3, so so far so good. Although the inline
asm seems a bit odd: these are derived from bits/syscalls.h, and whilst
it's clear what they're *trying* to do, I'm not sure about this
construction:
call void asm sideeffect "", "{somereg}"(i64 X) nounwind
%1 = call i64 asm sideeffect "", "={somereg}"() nounwind
Intuitively this seems like whilst it instructs to put X in somereg and
then to read somereg, it doesn't say that somereg must remain the same
in the meantime!
Lowering to x86 using -optimize-regalloc=0, and therefore the fast
register allocator, then leads to code like:
400190: c7 44 24 fc 01 00 00 movl $0x1,-0x4(%rsp)
400197: 00
400198: 41 b8 04 00 00 00 mov $0x4,%r8d
40019e: 4c 8d 54 24 fc lea -0x4(%rsp),%r10
4001a3: ba 03 00 00 00 mov $0x3,%edx ; Sets rdx/edx
to the correct 3rd arg (3 == TCP_CORK)
4001a8: be 01 00 00 00 mov $0x1,%esi
4001ad: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx ;
Clobbers the 3rd arg!
4001b4: 48 89 d7 mov %rdx,%rdi ; Uses the
clobbering value to set up the 1st arg
4001b7: b8 36 00 00 00 mov $0x36,%eax ; Syscall number
4001bc: 48 89 54 24 f0 mov %rdx,-0x10(%rsp)
4001c1: 0f 05 syscall ; RDX (= arg 3) still
clobbered
Here there is trouble: the x86-64 Linux ABI says the syscall number goes
in eax, then the args go [rdi, rsi, rdx, r10, r8] from left to right.
However as noted in line, it clobbers RDX after it has been set up for
the call!
Checking with strace indeed we see:
setsockopt(-1, SOL_SOCKET, 0xffffffff /* SO_??? */, [1], 4)
To compare, a version built without using -optimize-regalloc=0 produces x86:
400198: 41 b8 04 00 00 00 mov $0x4,%r8d
40019e: 4c 8d 54 24 fc lea -0x4(%rsp),%r10
4001a3: ba 03 00 00 00 mov $0x3,%edx
4001a8: be 01 00 00 00 mov $0x1,%esi
4001ad: 49 c7 c1 ff ff ff ff mov $0xffffffffffffffff,%r9
4001b4: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi
4001bb: b8 36 00 00 00 mov $0x36,%eax
4001c0: 0f 05 syscall
Interestingly this sets R9, which would take the 6th argument if this
was a 6-arg call, suggesting the syscall sequence is being special-cased
as it is not mentioned at all in the IR for the call. This works as
expected, yielding correct strace:
setsockopt(-1, SOL_SOCKET, SO_TYPE, [1], 4) (TCP_CORK == SO_TYPE).
Above, I said I suspected that the list of "asm sideeffect" calls
doesn't actually express the right constraints. Is that true?
Considering the line that actually makes the system call already
specifies register constraints, is there any need for the lines that
write individual values to registers, then read them for no apparent
purpose? In short, is this whole problem down to bits/syscalls.h making
unwarranted assumptions about the compiler, and we just get lucky with
the default/greedy register allocator?
If this is wrong, and the IR *does* correctly express "put these values
in these registers and syscall", where should I start figuring out how
and why the allocator feels free to clobber RDX when it should be set up
for the call? I tried running the final IR->x86 lowering with
-print-after-all, and it appears all is well after 'Two-Address
instruction pass':
MOV32mi <fi#0>, 1, %noreg, 0, %noreg, 1; mem:ST4[%val]
%vreg3<def> = MOV64ri64i32 4; GR64:%vreg3
%R8<def> = COPY %vreg3; GR64:%vreg3
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R8
%vreg4<def> = LEA64r <fi#0>, 1, %noreg, 0, %noreg; GR64:%vreg4
%R10<def> = COPY %vreg4; GR64:%vreg4
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R10
%vreg5<def> = MOV64ri64i32 3; GR64:%vreg5
%RDX<def> = COPY %vreg5; GR64:%vreg5
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDX
%vreg6<def> = MOV64ri64i32 1; GR64:%vreg6
%RSI<def> = COPY %vreg6; GR64:%vreg6
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RSI
%vreg7<def> = MOV64ri32 -1; GR64:%vreg7
%RDI<def> = COPY %vreg7; GR64:%vreg7
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDI
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDI<imp-def>
%vreg8<def> = COPY %RDI; GR64:%vreg8
%vreg2<def> = MOV64ri64i32 54; GR64:%vreg2
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RSI<imp-def>
%vreg9<def> = COPY %RSI; GR64:%vreg9
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDX<imp-def>
%vreg10<def> = COPY %RDX; GR64:%vreg10
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R10<imp-def>
%vreg11<def> = COPY %R10; GR64:%vreg11
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R8<imp-def>
%vreg12<def> = COPY %R8; GR64:%vreg12
%RDI<def> = COPY %vreg8; GR64:%vreg8
%RSI<def> = COPY %vreg9; GR64:%vreg9
%RDX<def> = COPY %vreg10; GR64:%vreg10
%R10<def> = COPY %vreg11; GR64:%vreg11
%R8<def> = COPY %vreg12; GR64:%vreg12
INLINEASM <es:syscall
> [sideeffect] [attdialect], $0:[regdef], %RAX<imp-def,tied5>,
$1:[reguse tiedto:$0], %vreg2<tied3>, $2:[reguse], %RDI, $3:[reguse],
%RSI, $4:[reguse], %RDX, $5:[reguse], %R10, $6:[reguse], %R8,
$7:[clobber], %EFLAGS<earlyclobber,imp-def>, $8:[clobber],
%CX<earlyclobber,imp-def>, $9:[clobber], %R11<earlyclobber,imp-def>,
<<badref>>; GR64:%vreg2
BUT there is trouble after "Prologue/Epilogue Insertion & Frame
Finalization":
BB#0: derived from LLVM BB %entry
MOV32mi %RSP, 1, %noreg, -4, %noreg, 1; mem:ST4[%val]
%R8<def> = MOV64ri64i32 4
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R8<kill>
%R10<def> = LEA64r %RSP, 1, %noreg, -4, %noreg
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R10<kill>
%RDX<def> = MOV64ri64i32 3
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDX<kill>
%RSI<def> = MOV64ri64i32 1
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RSI<kill>
%RDX<def> = MOV64ri32 -1
%RDI<def> = COPY %RDX
INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDI<kill>
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDI<imp-def>
%RAX<def> = MOV64ri64i32 54
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RSI<imp-def>
MOV64mr %RSP, 1, %noreg, -16, %noreg, %RDX<kill>; mem:ST8[FixedStack1]
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDX<imp-def>
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R10<imp-def>
INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R8<imp-def>
INLINEASM <es:syscall
> [sideeffect] [attdialect], $0:[regdef], %RAX<imp-def,tied5>,
$1:[reguse tiedto:$0], %RAX<kill,tied3>, $2:[reguse], %RDI<kill>,
$3:[reguse], %RSI<kill>, $4:[reguse], %RDX<kill>, $5:[reguse],
%R10<kill>, $6:[reguse], %R8<kill>, $7:[clobber],
%EFLAGS<earlyclobber,imp-def>, $8:[clobber], %CX<earlyclobber,imp-def>,
$9:[clobber], %R11<earlyclobber,imp-def>, <<badref>>
Here you can see the clobber happening as RDX is assigned for the second
time.
Finally, I tried manually editing out the apparently superfluous asm
statements from the LLVM IR, giving me a simpler program like this:
%val = alloca i32, align 4
store i32 1, i32* %val, align 4
%0 = ptrtoint i32* %val to i64
%asmtmp.i = call i64 asm sideeffect "syscall\0A\09",
"={ax},0,{rdi},{rsi},{rdx},{r10},{r8},~{fpsr},~{flags},~{cx},~{r11},~{cc},~{memory}"(i64
54, i64 -1, i64 1, i64 3, i64 %0, i64 4) nounwind, !srcloc !0
As you can see the arguments are now directly specified; the constraints
remain the same. This compiles correctly: the corresponding x86 code is:
400190: c7 44 24 fc 01 00 00 movl $0x1,-0x4(%rsp)
400197: 00
400198: b8 36 00 00 00 mov $0x36,%eax
40019d: 48 c7 c1 ff ff ff ff mov $0xffffffffffffffff,%rcx
4001a4: be 01 00 00 00 mov $0x1,%esi
4001a9: ba 03 00 00 00 mov $0x3,%edx
4001ae: 4c 8d 54 24 fc lea -0x4(%rsp),%r10
4001b3: 41 b8 04 00 00 00 mov $0x4,%r8d
4001b9: 48 89 cf mov %rcx,%rdi
4001bc: 48 89 4c 24 f0 mov %rcx,-0x10(%rsp)
4001c1: 0f 05 syscall
Note the use of RCX, not RDX, as a temporary, avoiding clobbering RDX.
This suggests to me that the allocator is correctly preserving
registers, and that the old IR is too loose, and so the question is
likely how Dragonegg should compile the syscall C / inline asm code to
LLVM IR. However I'd really appreciate anyone confirming or denying my
suspicions, as I'm kind of learning as I go here!
Chris
More information about the llvm-dev
mailing list