[LLVMdev] nested function's static link gets clobbered

Duncan Sands duncan.sands at math.u-psud.fr
Sat Nov 1 00:54:23 PDT 2008


Hi,

> I'm parallelizing loops to be called by pthread. The thread body that I pass
> to pthread_create looks like
> 
> define i8* @loop1({ i32*, i32* }* nest  %parent_frame, i8* %arg)
> parent_frame is pointer to shared variables in original function
> 
> 0x00007f0de11c41f0:     mov    (%r10),%rax
> 0x00007f0de11c41f3:     cmpl   $0x63,(%rax)
> 0x00007f0de11c41f6:     jg     0x7f0de11c420c
> 0x00007f0de11c41fc:     mov    0x8(%r10),%rax
> 0x00007f0de11c4200:     incl   (%rax)
> 0x00007f0de11c4202:     mov    (%r10),%rax
> 0x00007f0de11c4205:     incl   (%rax)
> 0x00007f0de11c4207:     jmpq   0x7f0de11c41f0
> 0x00007f0de11c420c:     xor    %rax,%rax
> 0x00007f0de11c420f:     retq
> 
> I use init_trampoline to generate code that sets up the static link:
> 
> 0x00007fffee982316:     mov    $0x7f48e1a08fb0,%r11
> 0x00007fffee982320:     mov    $0x7fffee982330,%r10               the static
> link
> 0x00007fffee98232a:     rex.WB jmpq   *%r11
> 
> The program crashes in loop1 on the 2nd instruction. r10, which contained
> the static link was different from the value set by the trampoline.
> 
> Upon closer inspection, it looks like the trampoline first jumps to a stub
> that compiles loop1:
> 
> 0x00007f48e1a08fb0:     mov    $0x5c61c0,%r10
> 0x00007f48e1a08fba:     callq  *%r10
> 0x00007f48e1a08fbd:     int    $0x0
> 
> But that clobbers r10 which loop1 needs. According to the x86-64 ABI, r10
> isn't preserved across functions, but here it needs to be. Is there anyway
> to force LLVM to do that?

you must be the first person to try using nest functions with the JIT :)
If you look in X86JITInfo.cpp, in the function X86JITInfo::emitFunctionStub,
you will see the code generating the stub and using r10.  I think the right
solution is to change r10 to a different call clobbered register.  It would
also be possible to have the trampoline use a different register, but since
the x86-64 ABI explicitly states that r10 should be used for the static chain,
I'd rather not.

I'm also wondering about the x86-32 case.  There are no comments in the
JIT stub code in this case, so I'm not sure which register it is using.
The problem with x86-32 is that there are so few registers, and for some
calling conventions there is only one spare call clobbered register
available.  This is used by trampolines, so if it's also used by JIT,
which is almost surely the case, that will cause trouble.  Even worse,
it looks like the JIT is wrong even without trampolines, because for
the C and X86_StdCall conventions it is ECX that is spare, while for
X86_FastCall and Fast it is EAX.  Yet the JIT always uses the same
hardwired code, and does not adjust according to the calling convention.
So presumably it is broken for one of these sets of calling conventions.

Hopefully Anton can comment on this.

> I tried telling lli to compile the entire program
> (-no-lazy) so that the stub won't be generated, but gives the error:
> 
> LLVM JIT requested to do lazy compilation of function
> '_Z41__static_initialization_and_destruction_0ii' when lazy compiles are
> disabled!
> 
> Any ideas?
> 
> Note, I had to compile lli with -z execstack in order for trampolines on the
> stack to work.

Maybe lli can be taught to mark itself as having an executable stack when
it sees a trampoline.  I'm not sure how this can best be done.  On linux
I guess it can be done using mmap.

Ciao,

Duncan.



More information about the llvm-dev mailing list