[LLVMdev] System call miscompilation using the fast register allocator

Mon Oct 21 17:18:44 PDT 2013

Hi,

Apologies this is a bit lengthy. TLDR: I'm using Dragonegg + LLVM 3.2 
and uClibc, and am finding that using the Fast register allocator (i.e. 
-optimize-regalloc=0) causes miscompilation of setsockopt calls (5-arg 
system calls). The problem doesn't happen with the default register 
allocation path selected. It can be worked around by manually 
simplifying the system call setup sequence. I'm looking to find out 
whether this is a bug with the fast register allocator, or whether the 
Linux headers' description of the system call setup sequence, or 
gcc/Dragonegg's interpretation of such, is faulty and the allocator just 
so happens to expose the bug.

Now the long version:

I'm building a simple test program that uses a 5-argument system call 
using LLVM 3.2, like:

int main(int argc, char** argv) {

     int val = 1;
     socklen_t len = 4;
     return setsockopt(-1, SOL_SOCKET, TCP_CORK, &val, len);

}

setsockopt is provided by uclibc, and is available as LLVM, leading to 
optimised LLVM code like:

define i32 @main(i32 %argc, i8** nocapture %argv) unnamed_addr nounwind 
uwtable {
entry:
   %val = alloca i32, align 4
   store i32 1, i32* %val, align 4
   %0 = ptrtoint i32* %val to i64
   call void asm sideeffect "", "{r8}"(i64 4) nounwind
   call void asm sideeffect "", "{r10}"(i64 %0) nounwind
   call void asm sideeffect "", "{rdx}"(i64 3) nounwind
   call void asm sideeffect "", "{rsi}"(i64 1) nounwind
   call void asm sideeffect "", "{rdi}"(i64 -1) nounwind
   %1 = call i64 asm sideeffect "", "={rdi}"() nounwind
   %2 = call i64 asm sideeffect "", "={rsi}"() nounwind
   %3 = call i64 asm sideeffect "", "={rdx}"() nounwind
   %4 = call i64 asm sideeffect "", "={r10}"() nounwind
   %5 = call i64 asm sideeffect "", "={r8}"() nounwind
   %asmtmp.i = call i64 asm sideeffect "syscall\0A\09", 
"={ax},0,{rdi},{rsi},{rdx},{r10},{r8},~{fpsr},~{flags},~{cx},~{r11},~{cc},~{memory}"(i64 
54, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5) nounwind, !srcloc !0

SOL_SOCKET = 1 and TCP_CORK = 3, so so far so good. Although the inline 
asm seems a bit odd: these are derived from bits/syscalls.h, and whilst 
it's clear what they're *trying* to do, I'm not sure about this 
construction:

call void asm sideeffect "", "{somereg}"(i64 X) nounwind
%1 = call i64 asm sideeffect "", "={somereg}"() nounwind

Intuitively this seems like whilst it instructs to put X in somereg and 
then to read somereg, it doesn't say that somereg must remain the same 
in the meantime!

Lowering to x86 using -optimize-regalloc=0, and therefore the fast 
register allocator, then leads to code like:

   400190:       c7 44 24 fc 01 00 00    movl   $0x1,-0x4(%rsp)
   400197:       00
   400198:       41 b8 04 00 00 00       mov    $0x4,%r8d
   40019e:       4c 8d 54 24 fc          lea    -0x4(%rsp),%r10
   4001a3:       ba 03 00 00 00          mov    $0x3,%edx ; Sets rdx/edx 
to the correct 3rd arg (3 == TCP_CORK)
   4001a8:       be 01 00 00 00          mov    $0x1,%esi
   4001ad:       48 c7 c2 ff ff ff ff    mov $0xffffffffffffffff,%rdx ; 
Clobbers the 3rd arg!
   4001b4:       48 89 d7                mov    %rdx,%rdi ; Uses the 
clobbering value to set up the 1st arg
   4001b7:       b8 36 00 00 00          mov    $0x36,%eax ; Syscall number
   4001bc:       48 89 54 24 f0          mov    %rdx,-0x10(%rsp)
   4001c1:       0f 05                   syscall ; RDX (= arg 3) still 
clobbered

Here there is trouble: the x86-64 Linux ABI says the syscall number goes 
in eax, then the args go [rdi, rsi, rdx, r10, r8] from left to right. 
However as noted in line, it clobbers RDX after it has been set up for 
the call!

Checking with strace indeed we see:

setsockopt(-1, SOL_SOCKET, 0xffffffff /* SO_??? */, [1], 4)

To compare, a version built without using -optimize-regalloc=0 produces x86:

   400198:       41 b8 04 00 00 00       mov    $0x4,%r8d
   40019e:       4c 8d 54 24 fc          lea    -0x4(%rsp),%r10
   4001a3:       ba 03 00 00 00          mov    $0x3,%edx
   4001a8:       be 01 00 00 00          mov    $0x1,%esi
   4001ad:       49 c7 c1 ff ff ff ff    mov $0xffffffffffffffff,%r9
   4001b4:       48 c7 c7 ff ff ff ff    mov $0xffffffffffffffff,%rdi
   4001bb:       b8 36 00 00 00          mov    $0x36,%eax
   4001c0:       0f 05                   syscall

Interestingly this sets R9, which would take the 6th argument if this 
was a 6-arg call, suggesting the syscall sequence is being special-cased 
as it is not mentioned at all in the IR for the call. This works as 
expected, yielding correct strace:

setsockopt(-1, SOL_SOCKET, SO_TYPE, [1], 4) (TCP_CORK == SO_TYPE).

Above, I said I suspected that the list of "asm sideeffect" calls 
doesn't actually express the right constraints. Is that true? 
Considering the line that actually makes the system call already 
specifies register constraints, is there any need for the lines that 
write individual values to registers, then read them for no apparent 
purpose? In short, is this whole problem down to bits/syscalls.h making 
unwarranted assumptions about the compiler, and we just get lucky with 
the default/greedy register allocator?

If this is wrong, and the IR *does* correctly express "put these values 
in these registers and syscall", where should I start figuring out how 
and why the allocator feels free to clobber RDX when it should be set up 
for the call? I tried running the final IR->x86 lowering with 
-print-after-all, and it appears all is well after 'Two-Address 
instruction pass':

     MOV32mi <fi#0>, 1, %noreg, 0, %noreg, 1; mem:ST4[%val]
     %vreg3<def> = MOV64ri64i32 4; GR64:%vreg3
     %R8<def> = COPY %vreg3; GR64:%vreg3
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R8
     %vreg4<def> = LEA64r <fi#0>, 1, %noreg, 0, %noreg; GR64:%vreg4
     %R10<def> = COPY %vreg4; GR64:%vreg4
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R10
     %vreg5<def> = MOV64ri64i32 3; GR64:%vreg5
     %RDX<def> = COPY %vreg5; GR64:%vreg5
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDX
     %vreg6<def> = MOV64ri64i32 1; GR64:%vreg6
     %RSI<def> = COPY %vreg6; GR64:%vreg6
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RSI
     %vreg7<def> = MOV64ri32 -1; GR64:%vreg7
     %RDI<def> = COPY %vreg7; GR64:%vreg7
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDI
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDI<imp-def>
     %vreg8<def> = COPY %RDI; GR64:%vreg8
     %vreg2<def> = MOV64ri64i32 54; GR64:%vreg2
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RSI<imp-def>
     %vreg9<def> = COPY %RSI; GR64:%vreg9
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDX<imp-def>
     %vreg10<def> = COPY %RDX; GR64:%vreg10
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R10<imp-def>
     %vreg11<def> = COPY %R10; GR64:%vreg11
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R8<imp-def>
     %vreg12<def> = COPY %R8; GR64:%vreg12
     %RDI<def> = COPY %vreg8; GR64:%vreg8
     %RSI<def> = COPY %vreg9; GR64:%vreg9
     %RDX<def> = COPY %vreg10; GR64:%vreg10
     %R10<def> = COPY %vreg11; GR64:%vreg11
     %R8<def> = COPY %vreg12; GR64:%vreg12
     INLINEASM <es:syscall
     > [sideeffect] [attdialect], $0:[regdef], %RAX<imp-def,tied5>, 
$1:[reguse tiedto:$0], %vreg2<tied3>, $2:[reguse], %RDI, $3:[reguse], 
%RSI, $4:[reguse], %RDX, $5:[reguse], %R10, $6:[reguse], %R8, 
$7:[clobber], %EFLAGS<earlyclobber,imp-def>, $8:[clobber], 
%CX<earlyclobber,imp-def>, $9:[clobber], %R11<earlyclobber,imp-def>, 
<<badref>>; GR64:%vreg2

BUT there is trouble after "Prologue/Epilogue Insertion & Frame 
Finalization":

BB#0: derived from LLVM BB %entry
     MOV32mi %RSP, 1, %noreg, -4, %noreg, 1; mem:ST4[%val]
     %R8<def> = MOV64ri64i32 4
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R8<kill>
     %R10<def> = LEA64r %RSP, 1, %noreg, -4, %noreg
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R10<kill>
     %RDX<def> = MOV64ri64i32 3
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDX<kill>
     %RSI<def> = MOV64ri64i32 1
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RSI<kill>
     %RDX<def> = MOV64ri32 -1
     %RDI<def> = COPY %RDX
     INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDI<kill>
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDI<imp-def>
     %RAX<def> = MOV64ri64i32 54
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RSI<imp-def>
     MOV64mr %RSP, 1, %noreg, -16, %noreg, %RDX<kill>; mem:ST8[FixedStack1]
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDX<imp-def>
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R10<imp-def>
     INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R8<imp-def>
     INLINEASM <es:syscall
     > [sideeffect] [attdialect], $0:[regdef], %RAX<imp-def,tied5>, 
$1:[reguse tiedto:$0], %RAX<kill,tied3>, $2:[reguse], %RDI<kill>, 
$3:[reguse], %RSI<kill>, $4:[reguse], %RDX<kill>, $5:[reguse], 
%R10<kill>, $6:[reguse], %R8<kill>, $7:[clobber], 
%EFLAGS<earlyclobber,imp-def>, $8:[clobber], %CX<earlyclobber,imp-def>, 
$9:[clobber], %R11<earlyclobber,imp-def>, <<badref>>

Here you can see the clobber happening as RDX is assigned for the second 
time.

Finally, I tried manually editing out the apparently superfluous asm 
statements from the LLVM IR, giving me a simpler program like this:

   %val = alloca i32, align 4
   store i32 1, i32* %val, align 4
   %0 = ptrtoint i32* %val to i64
   %asmtmp.i = call i64 asm sideeffect "syscall\0A\09", 
"={ax},0,{rdi},{rsi},{rdx},{r10},{r8},~{fpsr},~{flags},~{cx},~{r11},~{cc},~{memory}"(i64 
54, i64 -1, i64 1, i64 3, i64 %0, i64 4) nounwind, !srcloc !0

As you can see the arguments are now directly specified; the constraints 
remain the same. This compiles correctly: the corresponding x86 code is:

   400190:       c7 44 24 fc 01 00 00    movl   $0x1,-0x4(%rsp)
   400197:       00
   400198:       b8 36 00 00 00          mov    $0x36,%eax
   40019d:       48 c7 c1 ff ff ff ff    mov $0xffffffffffffffff,%rcx
   4001a4:       be 01 00 00 00          mov    $0x1,%esi
   4001a9:       ba 03 00 00 00          mov    $0x3,%edx
   4001ae:       4c 8d 54 24 fc          lea    -0x4(%rsp),%r10
   4001b3:       41 b8 04 00 00 00       mov    $0x4,%r8d
   4001b9:       48 89 cf                mov    %rcx,%rdi
   4001bc:       48 89 4c 24 f0          mov    %rcx,-0x10(%rsp)
   4001c1:       0f 05                   syscall

Note the use of RCX, not RDX, as a temporary, avoiding clobbering RDX. 
This suggests to me that the allocator is correctly preserving 
registers, and that the old IR is too loose, and so the question is 
likely how Dragonegg should compile the syscall C / inline asm code to 
LLVM IR. However I'd really appreciate anyone confirming or denying my 
suspicions, as I'm kind of learning as I go here!

Chris