Hello all,<div><br></div><div>Here is an updated patch. It contains all of the fixes I discussed here, and it also contains some significant updates:</div><div><br></div><div>- Now has additionally survived intensive testing by Duncan. All crashers found with all the dragonegg test suites (including many ada tests etc) are fixed. The initial and immediate miscompile issues fixed. It's looking pretty good testing wise.</div>

<div><br></div><div>- *Many* test cases added. Lots of bugs fixed. Generally, quite a bit of hardening here. Duncan found several soft areas that I then was able to more thoroughly exercise.</div><div><br></div><div><br></div>

<div>I think this is getting *really* close to being ready for an initial commit. It is still behind a default-off flag. It will stay there for a bit. In particular I want to get more testing, I want to implement SSAUpdater-logic and tie it into the rest of the pass manager and collect final performance results before enabling it. But I don't want to keep hacking on this out of tree. =]</div>

<div><br></div><div>Anyways, review greatly appreciated.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Aug 30, 2012 at 1:37 AM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">As with my responses to Duncan, I'm replying to issues of interest here, and will post a new patch after incorporating your feedback.<div>

<br></div><div>Thanks again for taking the time to look at this.<br><br><div class="gmail_quote"><div class="im">

On Wed, Aug 22, 2012 at 8:35 AM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com" target="_blank">clattner@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>On Aug 21, 2012, at 10:46 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>> wrote:<br>

> Ok, I think this patch is pretty close to being ready to go into the tree, modulo any detailed code review folks are willing to do. Just send me comments if you have them.<br>

<br>

</div>Very nice, I look forward to never seeing an i1024 from SRoA again :)<br>

<br>

Here are some initial thoughts, I haven't done a detailed algorithmic analysis of this yet:<br>

<div><br>

> I've run it through the nightly test suite now as well as a bootstrap, and everything looks better than clear, it looks great. ;]<br>

><br>

> I'm seeing some pretty significant execution performance improvements. One test is 27% faster, several range from 2% to 6% faster. Only a couple really slow down. I haven't dug in to the performance too much as currently we're still running the old SROA in the CGSCC passes.<br>


><br>

> I've still got some code cleanup, minor refactorings I'd like to do, and the SSAUpdater work, but I'm trying to not grow the patch on people.<br>

<br>

</div>+/// which are merely transfered in and out of external memory remain unchanged<br>

<br>

typo: transferred.<br>

<br>

+/// Because this also performs alloca promotion, it can be thought of as a also<br>

<br>

"as a also"<br>

<br>

Several #includes can be pruned out, GlobalVariable and Function.h at least.<br></blockquote><div><br></div></div><div>Yes, I've not pruned the includes from the old pass. Sorry about any confusion there, I just didn't want to mess w/ compilation errors.</div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

+  struct Partition {<br>

+  struct PartitionUse {<br>

<br>

Doxygen comments please.  These look similar, the comment should explain what they're for and why they're different.<br></blockquote><div><br></div></div><div>Done, and one better. They now share a common base to factor away the commonality.</div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

+  /// \brief MemTransferInst auxilliary data.<br>

<br>

typo "auxilliary" in a few places.<br>

<br>

<br>

<br>

I'm not crazy about template instantiation of your entire "BuilderBase" class, how much code size does this cost, to what benefit?<br></blockquote><div><br></div></div><div>I don't know, and frankly I didn't even measure it. It's four relatively small functions, and some instance variables. I mostly factored it out because I was tired of fixing bugs in two places, and sharing the instance variables was a real pain.</div>


<div><br></div><div>I can't imagine this is a big deal, at two copies of these functions. If it is, I would actually blame the compiler. ;] At a fundamental level I don't see a lot of easy ways to make the base class not be templated. I tried a few times and it always ended up really uglying the code up....</div>

<div class="im">

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

+static Value *getNaturalGEPWithType(IRBuilder<> IRB, const TargetData &TD,<br>

+                                    Value *BasePtr, Type *Ty, Type *TargetTy,<br>

+                                    SmallVectorImpl<Value *> &Indices,<br>

+                                    const Twine &Prefix) {<br>

...<br>

+    if (SequentialType *SeqTy = dyn_cast<SequentialType>(ElementTy)) {<br>

<br>

Do we really want to form GEP's into vectors?<br></blockquote><div><br></div></div><div>Hmm, I suppose I would be OK having the rewriter canonicalize *all* vector accesses as whole vector accesses with insert element and extract element. However, in the case that this is buried within a much larger unpromoteable structure, and we may never access the vector as a whole, I worry a bit about actually making the code bigger as a consequence.</div>


<div><br></div><div>It seems more important to work hard at doing this when there is a chance of promotion.</div><div class="im"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

<br>

+class SROA : public FunctionPass {<br>

+bool SROA::rewriteAllocaPartition(AllocaInst &AI,<br>

...<br>

<br>

Doxygen comments please.<br></blockquote><div><br></div></div><div>Yes, many many doxygen comments (and non-doxygen comments) have been added.</div><div><br></div><div>Is my mixture of doxygen / non-doxygen comments problematic? I tend to use doxygen comments whenever the comment is really documenting an interface, and not when the comment only makes since as an aid to the reader of the code itself.</div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

getNaturalGEPWithType and others can be simplified through the use of IRB.getInt32()<br></blockquote><div><br></div></div><div>Wow, yea. I'm sad I kept missing these helpers when reading IRBuilder.</div><div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

getNaturalGEPWithOffset, getAdjustedPtr, etc should pass APInts by const reference.<br>

<br>

TargetTy != IRB.getInt8Ty()<br>

-> should be "!TargetTy->isIntegerTy(8)" for efficiency.<br>

<br>

<br>

+    // Find the operand we need to rewrite here.<br>

+    User::op_iterator OI = PN.op_begin(), OE = PN.op_end();<br>

+    for (; OI != OE && *OI != OldPtr; ++OI)<br>

+      ;<br>

+    assert(OI != OE);<br>

<br>

Is it feasible to not lose the Use index?  PHI nodes can have a lot of operands to scan though.<br></blockquote><div><br></div></div><div>Well, this code is really just broken.</div><div><br></div><div>There isn't a single index -- the PHI might use the same pointer in several different operands. We unique stuff so we only will visit the PHI node once for a particular used value, but we need to replace that value anywhere within the PHI. I've fixed the code to do this. The only other way I can see to write this is to walk the use list of the OldPtr and check each User is the PHI node? Not sure that would be any cheaper.</div>


<div><br>Note that we also do a full linear walk of the operands when doing speculation. =/</div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

bool SROA::runOnAlloca(AllocaInst &AI) {<br>


...<br>

<br>

This code handles the case when something is a trivially promotable scalar (an important base case for SRoA) but it doesn't handle the case where we have a trivial aggregate aggregate being accessed in a "type safe" way.  Is this an interesting case to handle specially to improve compile time?<br>


</blockquote><div><br></div></div><div>Can you clarify what you mean here by 'trivial aggregate aggregate being accessed in a "type safe" way"? I'm not sure which exact case you're referring to.</div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">+  // No partitions to split. Leave the dead alloca for a later pass to clean up.<br>

+  if (P.begin() == P.end())<br>

<br>

P.empty()?<br></blockquote><div><br></div></div><div>One more method I have to add to AllocaPartitioning. With only one callsite it hardly seemed worth it.</div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

+void SROA::deleteDeadInstructions() {<br>

<br>

Is this the same as RecursivelyDeleteTriviallyDeadInstructions?</blockquote><div><br></div></div><div>Yep, it's just an attempt to decrease the malloc traffic by re-using the nice scratch space we already have. I also cheat and clear some of the state at the same time. It also keeps nice statistics. Not sure how much its worth re-using the recursive function, the bulk of the work is in isInstructionTriviallyDead which it already uses.</div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">  Perhaps RecursivelyDeleteTriviallyDeadInstructions could be enhanced to take an optional pointer to a vector, where (when present) it records all the instruction*'s it deletes?<br>


</blockquote><div><br></div></div><div>Yea, I can see a way to factor the existing utility function to be what I want here. I can do that if you want either now or later.</div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


It would be nice if the "load speculation" stuff was pulled out of AllocaPartitionRewriter as some sort of orthogonal transformation.  Perhaps the initial analysis of an alloca can build a list of "selects and phis that need to be speculated" in order to unblock the transformation, and they could be processed independently of the other heavy lifting going on.<br>


</blockquote><div><br></div></div><div>This is definitely something I think should happen eventually. There should be a simple function call that does the speculation for a PHI node or a select. That said, I don't really want to do this until the pass stabilizes a bit in the interest of avoiding hoisting and abstracting the wrong logic here.</div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It might make sense to split this pass up into two files to separate the partition analysis code from the transformation and utility code.  Just a single header file (in Transforms/Scalar) might be enough, or go with another header + .cpp file in Transforms/Scalar.  What do you think?<br>


</blockquote><div><br></div></div><div>Meh. I mean, I see how I could split it up like that, definitely, but it doesn't quite seem worth it at this stage. Maybe later as it gets more complicated?</div><div><br></div>

<div>I would vote doing this later if at all. It'd just get in my way to have it in multiple files right now.</div><div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">FYI, on a random note, Dan Gohman is looking enhancing memcpy to get TBAA and "hole" information per PR12260/PR11946, which seems like it should drop in nicely.</blockquote>


<div><br></div></div><div>Yep. I have been talking with him (and thanks for forwarding the proposal while the mailing lists were down). I actually have a counter proposal for one particular aspect of it I need to write up. ;] Anyways, thanks for the feedback. I'm going to try to squash a few crashers Duncan found and then post an updated patch.</div>


</div></div>

<br>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></div>