[PATCH] D91020: [X86] Unbind the ebx with GOT address in regcall calling convention

Xiang Zhang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 23 20:08:40 PST 2020


xiangzhangllvm added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:4136
 
       // Note: The actual moving to ECX is done further down.
       GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee);
----------------
xiangzhangllvm wrote:
> LuoYuanke wrote:
> > xiangzhangllvm wrote:
> > > LuoYuanke wrote:
> > > > Would you add a test case for tail call? Is there any conflict to ECX?
> > > I check here local before, For X86, tail call will not work with regcall, For X86_64 it will work, but X86_64 don't fix ebx with GOT point.
> > > please check the following case, it will not generate jump  for 2nd command.
> > > 
> > > ```
> > >   1 ; llc -mtriple=x86_64-unknown-linux-gnu -relocation-model=pic
> > >   2 ; llc -mtriple=i386-unknown-linux-gnu -relocation-model=pic  <-tailcallopt>
> > >   3
> > >   4 declare x86_regcallcc void @regcall_not_lazy(i32 %a0, i32 %b0)
> > >   5
> > >   6 define void @tail_call_regcall() nounwind {
> > >   7   tail call x86_regcallcc void @regcall_not_lazy(i32 0, i32 1)
> > >   8   ret void
> > >   9 }
> > > ```
> > It seems compiler generate jmp instruction only when the argument number is less or equal to 2 and without pic relocation model.
> > ```
> > ; llc -mtriple=x86_64-unknown-linux-gnu
> > ; llc -mtriple=i386-unknown-linux-gnu
> > 
> > @foo6 = external global void (i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5)*
> > 
> > define void @tail_call_regcall6(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, ...) nounwind {
> >   %t0 = alloca i32, align 128
> >   %t1 = load void (i32, i32, i32, i32, i32, i32)*, void (i32, i32, i32, i32, i32, i32)** @foo6, align 4
> >   tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5) nounwind
> >   ret void
> > }
> > 
> > @foo5 = external global void (i32 %0, i32 %1, i32 %2, i32 %3, i32 %4)*
> > 
> > define void @tail_call_regcall5(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) nounwind {
> >   %t1 = load void (i32, i32, i32, i32, i32)*, void (i32, i32, i32, i32, i32)** @foo5, align 4
> >   ; tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2, i32 3, i32 4) nounwind
> >   tail call x86_regcallcc void %t1(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) nounwind
> >   ret void
> > }
> > 
> > @foo4 = external global void (i32 %0, i32 %1, i32 %2, i32 %3)*
> > 
> > define void @tail_call_regcall4(i32 %a, i32 %b, i32 %c, i32 %d) nounwind {
> >   %t1 = load void (i32, i32, i32, i32)*, void (i32, i32, i32, i32)** @foo4, align 4
> >   ; tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2, i32 3, i32 4) nounwind
> >   tail call x86_regcallcc void %t1(i32 %a, i32 %b, i32 %c, i32 %d) nounwind
> >   ret void
> > }
> > 
> > @foo3 = external global void (i32 %0, i32 %1, i32 %2)*
> > 
> > define void @tail_call_regcall3(i32 %a, i32 %b) nounwind {
> >   %t1 = load void (i32, i32, i32)*, void (i32, i32, i32)** @foo3, align 4
> >   tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2) nounwind
> >   ret void
> > }
> > 
> > @foo2 = external global void (i32 %0, i32 %1)*
> > 
> > define void @tail_call_regcall2(i32 %a, i32 %b) nounwind {
> >   %t1 = load void (i32, i32)*, void (i32, i32)** @foo2, align 4
> >   tail call x86_regcallcc void %t1(i32 0, i32 1) nounwind
> >   ; tail call x86_regcallcc void %t1(i32 %a, i32 %b) nounwind
> >   ret void
> > }
> > ```
> I add "-tailcallopt" on your test, all jump disappeared.
> The constrain of tail call should just be "variable argument lists are used" (should not according to the numbers of function).
> I guess there must be some bug about tail call itself.
> Anyway I'll take a deeper look to check the tail call, all my tests I checked before under pic mode.
> And for the relation of ebx and GOT we only need to check pic mode.
> It seems compiler generate jmp instruction only when the argument number is less or equal to 2 and without pic relocation model.
    Yes, in fact, current X86 Lowering has consider the "tailcall address may be in a register", and it try to "escape" no register for allocation problem, so it limited the register number for function args.
    For PIC mode, one more register need to be "bind" to GOT, so the register number for function args should less than non-PIC mode. And PIC mode will disable tail calls to external symbols with default visibility.
   So I reproduce the tail call case for PIC in following case:

```
; llc -mtriple=i386-unknown-linux-gnu -relocation-model=pic tail.ll

@a0 = global i32 0, align 4

define x86_regcallcc void @tail_call_regcall2(i32 %a) nounwind {
  tail call x86_regcallcc void @__regcall3__func(i32 %a) nounwind
  ret void
}

define internal x86_regcallcc void @__regcall3__func(i32 %i1) #0 {
entry:
  store i32 %i1, i32* @a0, align 4
  ret void
}
```
Current change did no affect on it. (tail call load the  address of the callee into ECX at PIC duo to the ebx/callee-saved problem)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91020/new/

https://reviews.llvm.org/D91020



More information about the llvm-commits mailing list