[LLVMdev] Shrink Wrapping - RFC and initial implementation

Fri Mar 13 10:43:38 PDT 2009

Hi Evan,
Thanks very much for the review, I am implementing your suggestions today
and will have the next patch together this weekend.

A few questions/comments:

On Thu, Mar 12, 2009 at 10:05 AM, Evan Cheng <echeng at apple.com> wrote:

>
> 1. Some of the functions that you introduced, e.g. stringifyCSRegSet
> probably ought to be "static" and ifdef'ed out when NDEBUG is defined.
>
> 2. +      // DEBUG
>
>
> +      if (! MBB->empty() && ! CSRUsed[MBB].intersects(restore)) {
>
>
> +        MachineInstr* MI = BeforeI;
>
>
> +        DOUT << "adding restore after ";
>
>
> +        DEBUG(MI->dump());
>
>
> +      } else {
>
>
> +        DOUT << "adding restore to beginning of "
>
>
> +             << getBasicBlockName(MBB) << "\n";
>
>
> +      }
>
>
> +      // DEBUG
>
> Code like this should also be inside ifndef NDEBUG.
>
> 3. It can still use more refactoring. :-)
>
> 4.  clearSets().
> It's not clear what sets it's clearing. Perhaps name it something like
> clearShrinkWrapData()?
>

Agreed in toto, I will refactor further and try to get all remaining
cleanups into the next patch.

> 5
> +void PEI::placeSpillsAndRestores(MachineFunction &Fn) {
>
>
> ...
>
> The shrink wrap version probably should go to its own function. Otherwise,
> it should exit early when the non-shrink wrapping version is done. That
> reduces nesting and please those of us who are a bit picky.
>

I originally wrote it this way (separate functions). I then refactored the
code to unify the two behaviors around the idea of placing spills/restores.
I think the versions need to be separate for the reasons you state but the
placement idea should be retained. I will reintroduce the separation.

> 6.
> +      // Save entry block, return blocks.
>
>
> +      if (MBB->pred_size() == 0)
>
>
> +        entryBlock = MBB;
>
> Entry block is the Fn.front().
>
> 7.
> +        if (!MBB->empty() && MBB->back().getDesc().isReturn())
>
>
> +          returnBlocks.push_back(MBB);
>
> PEI::insertPrologEpilogCode also traverse MBBs and get at return blocks. So
> these probably ought to be shared, i.e. returnBlocks should be an ivar and
> determined early in PEI.
>

Absolutely (I knew Fn.front() was what I wanted but didn't go back and fix
things). I am implementing these.

> 8.
> +    for (MachineBasicBlock::iterator I = MBB->begin(); I != MBB->end();
> ++I) {
>
> +      for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
>
>
> +        unsigned Reg = CSI[i].getReg();
>
>
> +        // If instruction I reads or modifies Reg, add it to UsedCSRegs,
>
>
> +        // CSRUsed map for the current block.
>
>
> +        if (I->readsRegister(Reg, TRI) || I->modifiesRegister(Reg, TRI)) {
>
> readsRegister and modifiesRegister both scan the operands. It's probably
> better to manually walk through the operands.
>

Ok, that helps. I read the code but could not decide which way to do it at
first.

> 9.
> +  // If all uses of CSRegs are in the entry block, there is nothing
>
>
> +  // to shrink wrap: spills go in entry block, restores go in exiting
>
>
> +  // blocks, turn off shrink wrapping.
>
>
> +  if (allCSRUsesInEntryBlock) {
>
>
> +    ShrinkWrapping = false;
>
>
> +    DOUT << "All uses of CSRegs are in entry block, nothing to do.\n";
>
>
> +  }
>
>
> +  // If we have decided not to shrink wrap, just return now.
>
>
> +  if (! ShrinkWrapping)
>
>
> +    return true;
>
> Why not just return inside if (allCSRUsesInEntryBlock)?
>

ARGHHH, I thought I simplified that before cutting the patch.

> 10.
> +bool PEI::calculateUsedAnticAvail(MachineFunction &Fn) {
> ...
> +  // Calculate AnticIn, AnticOut using post-order traversal of MCFG.
> +  for (po_iterator<MachineBasicBlock*>
>
>
> +         MBBI = po_begin(Fn.getBlockNumbered(0)),
>
>
> +         MBBE = po_end(Fn.getBlockNumbered(0)); MBBI != MBBE; ++MBBI) {
>
>
> +    MachineBasicBlock* MBB = *MBBI;
> ...
> +  // Calculate Avail{In,Out} via top-down walk of Machine dominator tree.
>
>
> +  for (df_iterator<MachineDomTreeNode*> DI = df_begin(DT.getRootNode()),
>
>
> +         E = df_end(DT.getRootNode()); DI != E; ++DI) {
>
>
> Later in
> +/// placeSpillsAndRestores - decide which MBBs need spills, restores
>
>
> +/// of CSRs.
>
>
> +///
>
>
> +void PEI::placeSpillsAndRestores(MachineFunction &Fn) {
> ...
> +    // Calculate CSRRestore using post-order traversal of Machine-CFG.
>
>
> +    for (po_iterator<MachineBasicBlock*>
>
>
> +           MBBI = po_begin(Fn.getBlockNumbered(0)),
>
>
> +           MBBE = po_end(Fn.getBlockNumbered(0)); MBBI != MBBE; ++MBBI) {
>
> This seem to be doing traversal at least one too many times? Can this be
> improved?
>

I started to reduce the traversals, then decided to work on edge splitting
because I believe it may be needed to finish shrink wrapping.

I will return to that work and see if I can reduce the traversals, which for
this approach (computing Antic, Avail) will decrease the constant factor in
the runtime bound, which is linear in the size of the Machine IR.

> 11. Can you explain a bit more about AnticIn, AvailIn, etc.?
>

I am working on a document, currently hosted at github, which will present
the details of the implementation, examples, etc.

I looked at two approaches to determine spill/restore placements:

1. Try to use live intervals of CSRs that might be available when PEI runs.
    The idea here is that each CSR used in a function will have one or more
    defs which dominate one or more uses. Live intervals might lead me to
    the MBBs in which to place spills/restores.

2. Use "anticipatibility" (Antic{In,Out} sets) to find the points from which
all
    outgoing paths contain defs or uses of a CSR, and use "availability"
    (Avail{In,Out} sets) to find the points such that all incoming paths
contain
    defs or uses of a CSR. We place a spill for a CSR at the earliest point
    leading to a sequence of uses (a contiguous set of blocks containing
uses),
    so a block B will get a spill for CSR R if R is anticipatable at B and
_not_
    anticipatable at any predecessor of B. If R is used and redefined in a
block,
    we have to avoid placing another spill in that block, (it was spilled
earlier),
    so in addition to the above condition, R must not be available at B.
    Determining restore placement is the mirror image of spill placement.

I went with approach 2 despite the apparent complexity because the data flow
info is actually straightforward to compute, and I did not have to first
synthesize
LiveIntervals (read a ton of code) to get the pass working. I am putting
this information
into my temp. wiki page in hopes of getting it into the dev wiki when that
is available.

I am now looking at live intervals in connection with RA and code motion
(other possible projects),
and am trying to answer my question of whether live intervals could help
shrink wrapping.

Let me know if you think using live interval info would be worth
investigating for shrink wrapping.

> 12.
> Let's worry about edge splitting for a later time. :-)
>

Agreed. I am still working through the mechanics to understand how to do it
and ramifications.

> 13. After the code is cleaned up, we should consider checking it in and try
> it out as llcbeta. Do you have any idea of its compile time impact?
>
> Thanks,
>
> Evan
>

I'm working on characterizing the runtime and vm overhead, I don't yet have
a detailed picture.
My plan is to do the cleanups, put together a few larger test cases, go back
and run regressions,
then the test suite. With the larger focussed test cases, I will get usable
numbers for compile times, and
the test suite will extend the coverage.
Please let me know if there is a simpler or more standard way to tackle this
for a new pass.

What about EH and shrink wrapping? Should I disable shrink wrapping in EH
contexts?

I have held off looking at maintaining debug info integrity, let me know if
I should look at that or if it can wait a bit.

Thanks again,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090313/07da1c5a/attachment.html>