[LLVMdev] Shrink Wrapping - RFC and initial implementation

Tue Mar 17 23:44:00 PDT 2009

Hi John.

> I am putting this information
> into my temp. wiki page in hopes of getting it into the dev wiki when
> that is available.

The dev wiki is up at its temporary name http://google2.osuosl.org/wiki/.

Feel free to dump your stuff on there.

On Fri, Mar 13, 2009 at 7:43 PM, John Mosby <ojomojo at gmail.com> wrote:
> Hi Evan,
> Thanks very much for the review, I am implementing your suggestions today
> and will have the next patch together this weekend.
> A few questions/comments:
> On Thu, Mar 12, 2009 at 10:05 AM, Evan Cheng <echeng at apple.com> wrote:
>>
>> 1. Some of the functions that you introduced, e.g. stringifyCSRegSet
>> probably ought to be "static" and ifdef'ed out when NDEBUG is defined.
>> 2. +      // DEBUG
>>
>>
>> +      if (! MBB->empty() && ! CSRUsed[MBB].intersects(restore)) {
>>
>>
>> +        MachineInstr* MI = BeforeI;
>>
>>
>> +        DOUT << "adding restore after ";
>>
>>
>> +        DEBUG(MI->dump());
>>
>>
>> +      } else {
>>
>>
>> +        DOUT << "adding restore to beginning of "
>>
>>
>> +             << getBasicBlockName(MBB) << "\n";
>>
>>
>> +      }
>>
>>
>> +      // DEBUG
>> Code like this should also be inside ifndef NDEBUG.
>> 3. It can still use more refactoring. :-)
>> 4.  clearSets().
>> It's not clear what sets it's clearing. Perhaps name it something like
>> clearShrinkWrapData()?
>
> Agreed in toto, I will refactor further and try to get all remaining
> cleanups into the next patch.
>
>>
>> 5
>> +void PEI::placeSpillsAndRestores(MachineFunction &Fn) {
>>
>>
>> ...
>> The shrink wrap version probably should go to its own function. Otherwise,
>> it should exit early when the non-shrink wrapping version is done. That
>> reduces nesting and please those of us who are a bit picky.
>
> I originally wrote it this way (separate functions). I then refactored the
> code to unify the two behaviors around the idea of placing spills/restores.
> I think the versions need to be separate for the reasons you state but the
> placement idea should be retained. I will reintroduce the separation.
>
>>
>> 6.
>> +      // Save entry block, return blocks.
>>
>>
>> +      if (MBB->pred_size() == 0)
>>
>>
>> +        entryBlock = MBB;
>> Entry block is the Fn.front().
>> 7.
>> +        if (!MBB->empty() && MBB->back().getDesc().isReturn())
>>
>>
>> +          returnBlocks.push_back(MBB);
>> PEI::insertPrologEpilogCode also traverse MBBs and get at return blocks.
>> So these probably ought to be shared, i.e. returnBlocks should be an ivar
>> and determined early in PEI.
>
> Absolutely (I knew Fn.front() was what I wanted but didn't go back and fix
> things). I am implementing these.
>
>>
>> 8.
>> +    for (MachineBasicBlock::iterator I = MBB->begin(); I != MBB->end();
>> ++I) {
>>
>> +      for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
>>
>>
>> +        unsigned Reg = CSI[i].getReg();
>>
>>
>> +        // If instruction I reads or modifies Reg, add it to UsedCSRegs,
>>
>>
>> +        // CSRUsed map for the current block.
>>
>>
>> +        if (I->readsRegister(Reg, TRI) || I->modifiesRegister(Reg, TRI))
>> {
>> readsRegister and modifiesRegister both scan the operands. It's probably
>> better to manually walk through the operands.
>
> Ok, that helps. I read the code but could not decide which way to do it at
> first.
>
>>
>> 9.
>> +  // If all uses of CSRegs are in the entry block, there is nothing
>>
>>
>> +  // to shrink wrap: spills go in entry block, restores go in exiting
>>
>>
>> +  // blocks, turn off shrink wrapping.
>>
>>
>> +  if (allCSRUsesInEntryBlock) {
>>
>>
>> +    ShrinkWrapping = false;
>>
>>
>> +    DOUT << "All uses of CSRegs are in entry block, nothing to do.\n";
>>
>>
>> +  }
>>
>>
>> +  // If we have decided not to shrink wrap, just return now.
>>
>>
>> +  if (! ShrinkWrapping)
>>
>>
>> +    return true;
>> Why not just return inside if (allCSRUsesInEntryBlock)?
>
> ARGHHH, I thought I simplified that before cutting the patch.
>
>>
>> 10.
>> +bool PEI::calculateUsedAnticAvail(MachineFunction &Fn) {
>> ...
>> +  // Calculate AnticIn, AnticOut using post-order traversal of MCFG.
>> +  for (po_iterator<MachineBasicBlock*>
>>
>>
>> +         MBBI = po_begin(Fn.getBlockNumbered(0)),
>>
>>
>> +         MBBE = po_end(Fn.getBlockNumbered(0)); MBBI != MBBE; ++MBBI) {
>>
>>
>> +    MachineBasicBlock* MBB = *MBBI;
>> ...
>> +  // Calculate Avail{In,Out} via top-down walk of Machine dominator tree.
>>
>>
>> +  for (df_iterator<MachineDomTreeNode*> DI = df_begin(DT.getRootNode()),
>>
>>
>> +         E = df_end(DT.getRootNode()); DI != E; ++DI) {
>>
>> Later in
>> +/// placeSpillsAndRestores - decide which MBBs need spills, restores
>>
>>
>> +/// of CSRs.
>>
>>
>> +///
>>
>>
>> +void PEI::placeSpillsAndRestores(MachineFunction &Fn) {
>> ...
>> +    // Calculate CSRRestore using post-order traversal of Machine-CFG.
>>
>>
>> +    for (po_iterator<MachineBasicBlock*>
>>
>>
>> +           MBBI = po_begin(Fn.getBlockNumbered(0)),
>>
>>
>> +           MBBE = po_end(Fn.getBlockNumbered(0)); MBBI != MBBE; ++MBBI)
>> {
>> This seem to be doing traversal at least one too many times? Can this be
>> improved?
>
> I started to reduce the traversals, then decided to work on edge splitting
> because I believe it may be needed to finish shrink wrapping.
> I will return to that work and see if I can reduce the traversals, which for
> this approach (computing Antic, Avail) will decrease the constant factor in
> the runtime bound, which is linear in the size of the Machine IR.
>
>>
>> 11. Can you explain a bit more about AnticIn, AvailIn, etc.?
>
> I am working on a document, currently hosted at github, which will present
> the details of the implementation, examples, etc.
> I looked at two approaches to determine spill/restore placements:
> 1. Try to use live intervals of CSRs that might be available when PEI runs.
>     The idea here is that each CSR used in a function will have one or more
>     defs which dominate one or more uses. Live intervals might lead me to
>     the MBBs in which to place spills/restores.
> 2. Use "anticipatibility" (Antic{In,Out} sets) to find the points from which
> all
>     outgoing paths contain defs or uses of a CSR, and use "availability"
>     (Avail{In,Out} sets) to find the points such that all incoming paths
> contain
>     defs or uses of a CSR. We place a spill for a CSR at the earliest point
>     leading to a sequence of uses (a contiguous set of blocks containing
> uses),
>     so a block B will get a spill for CSR R if R is anticipatable at B and
> _not_
>     anticipatable at any predecessor of B. If R is used and redefined in a
> block,
>     we have to avoid placing another spill in that block, (it was spilled
> earlier),
>     so in addition to the above condition, R must not be available at B.
>     Determining restore placement is the mirror image of spill placement.
> I went with approach 2 despite the apparent complexity because the data flow
> info is actually straightforward to compute, and I did not have to first
> synthesize
> LiveIntervals (read a ton of code) to get the pass working. I am putting
> this information
> into my temp. wiki page in hopes of getting it into the dev wiki when that
> is available.
> I am now looking at live intervals in connection with RA and code motion
> (other possible projects),
> and am trying to answer my question of whether live intervals could help
> shrink wrapping.
> Let me know if you think using live interval info would be worth
> investigating for shrink wrapping.
>
>>
>> 12.
>> Let's worry about edge splitting for a later time. :-)
>
> Agreed. I am still working through the mechanics to understand how to do it
> and ramifications.
>
>>
>> 13. After the code is cleaned up, we should consider checking it in and
>> try it out as llcbeta. Do you have any idea of its compile time impact?
>> Thanks,
>> Evan
>
> I'm working on characterizing the runtime and vm overhead, I don't yet have
> a detailed picture.
> My plan is to do the cleanups, put together a few larger test cases, go back
> and run regressions,
> then the test suite. With the larger focussed test cases, I will get usable
> numbers for compile times, and
> the test suite will extend the coverage.
> Please let me know if there is a simpler or more standard way to tackle this
> for a new pass.
> What about EH and shrink wrapping? Should I disable shrink wrapping in EH
> contexts?
> I have held off looking at maintaining debug info integrity, let me know if
> I should look at that or if it can wait a bit.
> Thanks again,
> John
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>