Hi Evan,<div><br></div><div>Thanks very much for the review, I am implementing your suggestions today and will have the next patch together this weekend.</div><div><br></div><div>A few questions/comments:</div><div><br></div>
<div><div class="gmail_quote">On Thu, Mar 12, 2009 at 10:05 AM, Evan Cheng <span dir="ltr"><<a href="mailto:echeng@apple.com">echeng@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div><br></div><div>1. Some of the functions that you introduced, e.g. stringifyCSRegSet probably ought to be "static" and ifdef'ed out when NDEBUG is defined.</div><div><br>
</div><div>2. + // DEBUG </div><div>+ if (! MBB->empty() && ! CSRUsed[MBB].intersects(restore)) { </div>
<div>+ MachineInstr* MI = BeforeI; </div><div>+ DOUT << "adding restore after "; </div>
<div>+ DEBUG(MI->dump()); </div><div>+ } else { </div>
<div>+ DOUT << "adding restore to beginning of " </div><div>+ << getBasicBlockName(MBB) << "\n"; </div>
<div>+ } </div><div>+ // DEBUG</div><div><br></div><div>Code like this should also be inside ifndef NDEBUG.</div>
<div><br></div><div>3. It can still use more refactoring. :-)</div><div><br></div><div>4. clearSets().</div><div>It's not clear what sets it's clearing. Perhaps name it something like clearShrinkWrapData()?</div>
</div></blockquote><div><br></div><div>Agreed in toto, I will refactor further and try to get all remaining cleanups into the next patch.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div></div><div>5</div><div>+void PEI::placeSpillsAndRestores(MachineFunction &Fn) { </div>
<div>...</div><div><br></div><div>The shrink wrap version probably should go to its own function. Otherwise, it should exit early when the non-shrink wrapping version is done. That reduces nesting and please those of us who are a bit picky.</div>
</div></blockquote><div><br></div><div>I originally wrote it this way (separate functions). I then refactored the code to unify the two behaviors around the idea of placing spills/restores. I think the versions need to be separate for the reasons you state but the placement idea should be retained. I will reintroduce the separation.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div></div><div>6.</div><div><div>+ // Save entry block, return blocks. </div>
<div>+ if (MBB->pred_size() == 0) </div><div>+ entryBlock = MBB;</div><div><br></div>
<div>Entry block is the Fn.front().</div><div><br></div><div>7.</div><div><div>+ if (!MBB->empty() && MBB->back().getDesc().isReturn()) </div>
<div>+ returnBlocks.push_back(MBB);</div><div><br></div><div>PEI::insertPrologEpilogCode also traverse MBBs and get at return blocks. So these probably ought to be shared, i.e. returnBlocks should be an ivar and determined early in PEI.</div>
</div></div></div></blockquote><div><br></div><div>Absolutely (I knew Fn.front() was what I wanted but didn't go back and fix things). I am implementing these.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div><div><div></div><div>8.</div><div><div>+ for (MachineBasicBlock::iterator I = MBB->begin(); I != MBB->end(); ++I) { </div>
<div>+ for (unsigned i = 0, e = CSI.size(); i != e; ++i) { </div><div>+ unsigned Reg = CSI[i].getReg(); </div>
<div>+ // If instruction I reads or modifies Reg, add it to UsedCSRegs, </div><div>+ // CSRUsed map for the current block. </div>
<div>+ if (I->readsRegister(Reg, TRI) || I->modifiesRegister(Reg, TRI)) {</div><div><br></div><div>readsRegister and modifiesRegister both scan the operands. It's probably better to manually walk through the operands.</div>
</div></div></div></div></blockquote><div><br></div><div>Ok, that helps. I read the code but could not decide which way to do it at first.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div><div><div><div></div><div>9.</div><div><div>+ // If all uses of CSRegs are in the entry block, there is nothing </div>
<div>+ // to shrink wrap: spills go in entry block, restores go in exiting </div><div>+ // blocks, turn off shrink wrapping. </div>
<div>+ if (allCSRUsesInEntryBlock) { </div><div>+ ShrinkWrapping = false; </div>
<div>+ DOUT << "All uses of CSRegs are in entry block, nothing to do.\n"; </div><div>+ } </div>
<div>+ // If we have decided not to shrink wrap, just return now. </div><div>+ if (! ShrinkWrapping) </div>
<div>+ return true;</div><div><br></div><div>Why not just return inside if (allCSRUsesInEntryBlock)?</div></div></div></div></div></div></blockquote><div><br></div><div>ARGHHH, I thought I simplified that before cutting the patch.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><div><div><div></div><div>10.</div><div>+bool PEI::calculateUsedAnticAvail(MachineFunction &Fn) {</div>
<div>...</div><div>+ // Calculate AnticIn, AnticOut using post-order traversal of MCFG.</div><div><div>+ for (po_iterator<MachineBasicBlock*> </div>
<div>+ MBBI = po_begin(Fn.getBlockNumbered(0)), </div><div>+ MBBE = po_end(Fn.getBlockNumbered(0)); MBBI != MBBE; ++MBBI) { </div>
<div>+ MachineBasicBlock* MBB = *MBBI;</div><div>...</div><div><div>+ // Calculate Avail{In,Out} via top-down walk of Machine dominator tree. </div>
<div>+ for (df_iterator<MachineDomTreeNode*> DI = df_begin(DT.getRootNode()), </div><div>+ E = df_end(DT.getRootNode()); DI != E; ++DI) { </div>
<div><br></div><div><br></div><div>Later in </div><div><div>+/// placeSpillsAndRestores - decide which MBBs need spills, restores </div>
<div>+/// of CSRs. </div><div>+/// </div>
<div>+void PEI::placeSpillsAndRestores(MachineFunction &Fn) {</div><div>...</div><div><div>+ // Calculate CSRRestore using post-order traversal of Machine-CFG. </div>
<div>+ for (po_iterator<MachineBasicBlock*> </div><div>+ MBBI = po_begin(Fn.getBlockNumbered(0)), </div>
<div>+ MBBE = po_end(Fn.getBlockNumbered(0)); MBBI != MBBE; ++MBBI) { </div><div><br></div><div>This seem to be doing traversal at least one too many times? Can this be improved?</div></div></div></div></div></div>
</div></div></div></div></blockquote><div><br></div><div>I started to reduce the traversals, then decided to work on edge splitting because I believe it may be needed to finish shrink wrapping.</div><div><br></div><div>I will return to that work and see if I can reduce the traversals, which for this approach (computing Antic, Avail) will decrease the constant factor in the runtime bound, which is linear in the size of the Machine IR.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><div><div><div><div><div><div><div></div></div></div></div></div>
<div>11. Can you explain a bit more about AnticIn, AvailIn, etc.?</div></div></div></div></div></div></blockquote><div><br></div><div>I am working on a document, currently hosted at github, which will present the details of the implementation, examples, etc.</div>
<div><br></div><div>I looked at two approaches to determine spill/restore placements:</div><div><br></div><div>1. Try to use live intervals of CSRs that might be available when PEI runs.</div><div> The idea here is that each CSR used in a function will have one or more</div>
<div> defs which dominate one or more uses. Live intervals might lead me to</div><div> the MBBs in which to place spills/restores.</div><div><br></div><div>2. Use "anticipatibility" (Antic{In,Out} sets) to find the points from which all</div>
<div> outgoing paths contain defs or uses of a CSR, and use "availability"</div><div> (Avail{In,Out} sets) to find the points such that all incoming paths contain</div><div> defs or uses of a CSR. We place a spill for a CSR at the earliest point</div>
<div> leading to a sequence of uses (a contiguous set of blocks containing uses),</div><div> so a block B will get a spill for CSR R if R is anticipatable at B and _not_</div><div> anticipatable at any predecessor of B. If R is used and redefined in a block,</div>
<div> we have to avoid placing another spill in that block, (it was spilled earlier),</div><div> so in addition to the above condition, R must not be available at B.</div><div> Determining restore placement is the mirror image of spill placement.</div>
<div><br></div><div>I went with approach 2 despite the apparent complexity because the data flow</div><div>info is actually straightforward to compute, and I did not have to first synthesize</div><div>LiveIntervals (read a ton of code) to get the pass working. I am putting this information</div>
<div>into my temp. wiki page in hopes of getting it into the dev wiki when that is available.</div><div><br></div><div>I am now looking at live intervals in connection with RA and code motion (other possible projects),</div>
<div>and am trying to answer my question of whether live intervals could help shrink wrapping.</div><div><br></div><div>Let me know if you think using live interval info would be worth investigating for shrink wrapping.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><div><div><div></div><div>12.</div><div>Let's worry about edge splitting for a later time. :-)</div>
</div></div></div></div></div></blockquote><div><br></div><div>Agreed. I am still working through the mechanics to understand how to do it and ramifications.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div><div><div><div><div></div></div></div></div></div><div>13. After the code is cleaned up, we should consider checking it in and try it out as llcbeta. Do you have any idea of its compile time impact?</div>
<div><br></div><div>Thanks,</div><div><br></div><div>Evan</div></div></blockquote><div><br></div><div>I'm working on characterizing the runtime and vm overhead, I don't yet have a detailed picture.</div><div>My plan is to do the cleanups, put together a few larger test cases, go back and run regressions,</div>
<div>then the test suite. With the larger focussed test cases, I will get usable numbers for compile times, and</div><div>the test suite will extend the coverage.</div><div>Please let me know if there is a simpler or more standard way to tackle this for a new pass.</div>
<div><br></div><div>What about EH and shrink wrapping? Should I disable shrink wrapping in EH contexts?</div><div><br></div><div>I have held off looking at maintaining debug info integrity, let me know if I should look at that or if it can wait a bit.</div>
<div><br></div><div>Thanks again,</div><div>John</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div></div></div>