[PATCH] D18046: [X86] Providing correct unwind info in function epilogue

Mon May 2 15:41:21 PDT 2016

DavidKreitzer added a comment.

I think we want to make sure that we move in a direction that makes it easier to do optimizations that affect the CFI between X86FrameLowering and this late pass. For example, we cannot schedule the pushes generated by the X86CallFrameOptimization pass without moving the CFI along with the push. So we generate very poor code in cases like this where the push operands get in the way of outgoing inreg arguments:

  target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
  target triple = "i386-unknown-linux-gnu"

  declare i32 @f1(i32 inreg, i32 inreg, i32 inreg, i32, i32)
  define i32 @f2(i32 inreg %a, i32 inreg %b, i32 inreg %c, i32 %d, i32 %e) nounwind {
  entry:
    %call = tail call i32 @f1(i32 inreg 1, i32 inreg 2, i32 inreg 3, i32 %a, i32 %b)
    %add = add nsw i32 %call, 1
    ret i32 %add
  }

LLVM generates this:

  f2:
  	pushl	%edi
  	pushl	%esi
  	pushl	%eax
  	movl	%edx, %esi
  	movl	%eax, %edi
  	subl	$8, %esp
  	movl	$1, %eax
  	movl	$2, %edx
  	movl	$3, %ecx
  	pushl	%esi
  	pushl	%edi
  	calll	f1
  	addl	$16, %esp
  	incl	%eax
  	addl	$4, %esp
  	popl	%esi
  	popl	%edi
  	retl

icc generates much cleaner code (gcc is similar):

  f2:
          subl      $20, %esp
          movl      $3, %ecx
          pushl     %edx
          pushl     %eax
          movl      $1, %eax
          movl      $2, %edx
          call      f1
          incl      %eax
          addl      $28, %esp
          ret

We would also like the ability to accumulate the stack-cleanup "add %esp" instructions for a series of calls like this:

  target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
  target triple = "i386-unknown-linux-gnu"

  declare void @C(i32, i32, i32, i32)
  define void @F() nounwind {
  entry:
    tail call void @C(i32 1, i32 2, i32 3, i32 4)
    tail call void @C(i32 5, i32 6, i32 7, i32 8)
    tail call void @C(i32 9, i32 10, i32 11, i32 12)
    ret void
  }

Instead of what is currently generated

  F:
  	subl	$12, %esp
  	pushl	$4
  	pushl	$3
  	pushl	$2
  	pushl	$1
  	calll	C
  	addl	$16, %esp
  	pushl	$8
  	pushl	$7
  	pushl	$6
  	pushl	$5
  	calll	C
  	addl	$16, %esp
  	pushl	$12
  	pushl	$11
  	pushl	$10
  	pushl	$9
  	calll	C
  	addl	$28, %esp
  	retl

we can eliminate both "addl $16, %esp" instructions and bump up the last %esp adjust to "addl $60, %esp". This is simpler to do without having separate CFI & stack-adjust instructions.

To put this into a concrete proposal, I would suggest making this new pass responsible not only for generating proper epilog CFI but also for generating the CFI for simple stack adjusts. That would not only help enable optimizations like the above, but also eliminate the need for transformations that generate fixed stack adjusts to worry about also generating CFI. There have recently been at least 3 patches that added CFI for transforms involving stack adjusts (see http://reviews.llvm.org/D13767, http://reviews.llvm.org/D14021, http://reviews.llvm.org/D18246) all with their own logic for adding the CFI and deciding whether or not it's necessary.

Repository:
  rL LLVM

http://reviews.llvm.org/D18046