[LLVMdev] Shrink Wrapping - RFC and initial implementation

Sun Mar 1 14:57:49 PST 2009

First, thanks very much for your comments!

On Sat, Feb 28, 2009 at 8:05 PM, Evan Cheng <evan.cheng at apple.com> wrote:

>
> On Feb 26, 2009, at 2:02 PM, John Mosby wrote:
> > It is limited to X86 presently since that is the only target I have
> > access to at the moment.
>
> What part of this is target dependent? Is this due to emitPrologue /
> emitEpilogue being target specific?
>

It is target dependent (X86) at present because of the way I developed it,
just using the X86 target since that is the only one on which I can test the
entire (static) flow: test.c -> llvm-gcc -emit-llvm -> (.ll, .bc) -> llc
--shrink-wrap -> .s -> gcc test.s -o test.

I worked with other targets also, but I decided to take it as far as I could
on the first go with X86.

First pass was without debugging info and with simple stack frames, in the
interest of getting as much worked out as possible.
I saw the issue concerning how code gen handles placement of spill and
restore code outside of entry/return blocks before I had the first test
cases running, but I worked through the details using -march=x86 only.

Re: debugging info: I know about the work to change the way debugging info
is handled, so I held off trying to make the shrink wrapping work with the
current impl.

> > The main features are:
> >   - Placing callee saved register (CSR) spills and restores in the
> > CFG to tightly surround uses
> >      so that execution paths that do not use CSRs do not pay the
> > spill/restore penalty.
> >
> >   - Avoiding placment of spills/restores in loops: if a CSR is used
> > inside a loop(nest), the spills
> >      are placed in the loop preheader, and restores are placed in
> > the loop exit nodes (the
> >      successors of the loop _exiting_ nodes).
> >
> >   - Covering paths without CSR uses: e.g. if a restore is placed in
> > a join block, a matching spill
> >      is added to the end of all immediate predecessor blocks that
> > are not reached by a spill.
> >      Similarly for saves placed in branch blocks.
>
> Sounds great. It would help everyone if you can show some examples code.

I am putting documented examples together from the test cases in the patch.

> Since I ran into a non-trivial issue in developing this pass, I
> > would like to submit my implementation as a "RFC + code/tests"
> > rather than a typical contribution, and get people's opinions on how
> > to proceed.
> >
> > The issue is that the code generator assumes all spills and restores
> > of callee saved registers must be placed in the entry and return
> > blocks resp.
> > Shink wrapping violates this assumption, with the result that the
> > frame offsets computed by PEI for frame-relative operands may be
> > incorrect.
>
> I am not sure how this would happen. Why would frame offsets be
> affected by where these instructions are placed?

The issue is illustrated by a simple example in which a single CSR is used
in one branch of a conditional. When the stack frame is laid out, the spill
for this CSR is accounted for in the calculation of stack size as it should
be. The stack slot for the CSR is correctly computed and everything seems
fine when the MO_FrameIndex are replaced. The problem is that since the
spill instruction for the CSR (e.g. pushl %esi) is moved from the entry
block, the push does not happen, and the value of %esp in the entry block is
not what it should be to satisfy the offsets produced by
eliminateFrameIndex().
A similar situation exists for the BB into which a spill is "moved" (from
the entry block): a push happens  to spill the CSR on entry to the block,
and now %esp is not what it should be for that block. The example below
illustrates this issue:

assume:
int F(int a, int b, int c) uses one CSR in conditional branch

prologue, no shrink wrapping:

_F:
pushl %esi                   # spill CSR %csi, %esp -= 4 (in this case)
subl $56, %esp           # create frame, %esp = %esp - 56
movl 64(%esp), %eax  # fetch arg 'a' from entry %esp + 4
movl %eax, 52(%esp)
movl 68(%esp), %eax  # fetch arg 'b'
movl %eax, 48(%esp)
...

prologue with spill shrink-wrapped to basic block bb:

_F:
        # no spill of %esi, moved to bb
subl $56, %esp          # create frame same as before %esp = %esp - 56
movl 64(%esp), %eax  # error: 'a' is not at 64(%esp), it's at 60(%esp)
movl %eax, 52(%esp)
...

The simple, ugly hack of adjusting the value by which %esp is decremented in
the prologue when one or more CSR spills have been placed into other blocks
takes care of the issue on this simple code (no dynamic stack realign., nor
EH) on x86.

The companion hack for (non entry) MBBs into which spills have been
introduced is to adjust the stack size around eliminateFrameIndex()'s for
replacement of MO_FrameIndex operands.

Obviously, all of this applies only when spills are done with push/pop,
which is the case on x86. I used this issue to start looking at generalizing
how spills and restores are handled, before looking too closely at other
targets, and developed the workaround for the initial implementation.

> >
> > My limited implementation uses a workaround that adjusts the
> > generation of prologue code and the frame indices used by
> > the target eliminateFrameIndex() when necessary. I am looking at
> > several approaches, but I would like input from anyone who
> > has an opinion.
> >
>
> I think to do this right for every target is a big job. I'd like to
> see prologue / epilogue related stuff be moved out of
> TargetRegisterInfo. Shrink wrapping will only happen when the targets
> buy-in, i.e. providing the right hooks.
>

Part of what I'm doing now is estimating the work, which requires going
through the targets. I am not far enough along to send out a proposal.
Moving pro/epi generation out of TRI, perhaps into its own "component" is
one architecture I am looking at.

> When is shrink wrapping happening? Is it possible to do it after CSR
> spills and restores are inserted but before FI are lowered into sp /
> fp +/- offset?

Shrink wrapping starts after calculateCalleeSavedRegisters(), which creates
the list of CSRs used in the function. Shrink wrapping assigns MBB
placements for spills and restores based on where they are used.
calculateCalleeSavedRegisters() determines stack slots for the CSRs used in
the function.
I don't see an interaction between this and shrink wrapping, of have I
missed something?

> > Finally, I realize that shrink wrapping is probably not high
> > priority in the larger scheme of LLVM development, so I'm not
> > expecting
> > a huge response, but any ideas are certainly welcome.
>
> It's actually a fairly useful optimization. It can really help a class
> of functions, e.g. functions with early returns.
>

Quite right, it is certainly worthwhile. I could have left that comment out
:-)

> >
> > The patch and a test tarball are attached. I include basic tests
> > that are run with the supplied Makefile.
>
> Some comments:
>
> 1. The code needs some refactoring. :-) It's a big chunk of code so
> it's hard to digest.
> 2. There doesn't seem to be a way to turn off shrink wrapping. Please
> make sure it's optional. When it's not being done, PEI should not
> require dominator, etc.
>

I already refactored once, but I knew it would not be enough(!), I'll
definitely do another pass. I forgot to put the analysis deps under
--shrink-wrap, I will fix that and anything else that I might have left out
of the option.

>
>  From what I can see this is a very good first step. I look forward to
> seeing its completion.
>
> Evan
>

Thanks! Likewise, and it's a pleasure to work on.

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090301/70df295e/attachment.html>