<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Duncan,<div><br></div><div><div><div>On Jun 18, 2013, at 5:58 AM, Duncan Sands <<a href="mailto:duncan.sands@gmail.com">duncan.sands@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi Mark,<br><br>On 17/06/13 19:41, Mark Lacey wrote:<br><blockquote type="cite">Hi All,<br><br>This is a relatively small patch I re-submitted last week which seems to have gotten lost in the shuffle.<br><br>Can someone take a look and commit if everything looks good?<br></blockquote><br>sorry for forgetting about this.  My only remaining concern is that the<br>algorithm seems to be quadratic in the number of phi node operands.  It<br>is possible (though rare) to have phi nodes with hundreds of operands.<br>Maybe it is better to exploit a container type, eg use a map from basic<br>block to new value.  Then you can eg just whiz down the existing phi nodes,<br>chucking their (bb, value) pairs into the map whenever "value" is not undef,<br>then construct the new phi node by, for each predecessor bb, adding (bb, value)<br>if bb -> value in the map, and adding (bb, undef) if bb is not in the map.  Or<br>something like that.<br></blockquote></div><br></div><div>Thanks for taking another look. Apologies of my own for not getting back to you sooner, but I am just able to get back to this now.</div><div><br></div><div>The algorithm isn't quadratic in the number of phi node operands, but rather if the block with the phi has N predecessors, and we are trying to merge that block with a predecessor that itself has M predecessors, we can potentially do work proportional to MxN to determine the final inputs for the phi (which could potentially be as bad or worse than N^2, but could also be quite a bit better). The function CanPropagatePredecessorsForPHIs() does something similar (and gates whether we'll reach this particular code), but it appears to potentially improve on that worst case by using a SmallPtrSet<> to test for a common predecessor between the blocks (which when small has similar behavior, but when large is potentially better depending on how well the hash function performs).</div><div><br></div><div>I did some testing and found that indeed when merging a block with several hundred predecessors into another block with several hundred predecessors, there is a noticeable slow-down in SimplifyCFG.</div><div><br></div><div>I like your idea, so I went ahead and implemented it, as well as keeping track of which undef values were found in the phi (and then only walking those again to update them).</div><div><br></div><div>I've attached two patches - the first just refactors the body of the loop that is walking the phis into a separate function, and the second has the functional changes.</div><div><br></div><div>I built debug & release, and ran check-all and the full test suite.</div><div><br></div><div>Can you take another look, and if this looks okay, commit it?</div><div><br></div><div>Thanks again,</div><div><br></div><div>Mark</div><div><br></div><div><br></div><div><div style="margin: 0px 0px 0px 19px; font-size: 12px; ">[PATCH 1/2] Small refactoring of TryToSimplifyUncondBranchFromEmptyBlock.</div></div><div><br>Move the code that updates the phi nodes into a separate function. This<br>is in preparation for some functional changes to that factored-out code.<br>---<br>lib/Transforms/Utils/Local.cpp | 58 +++++++++++++++++++++++++++---------------<br>1 file changed, 37 insertions(+), 21 deletions(-)<br><br></div></body></html>