<div dir="ltr">Ugh, hit send too early:<br><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Aug 21, 2016 at 8:32 AM, Daniel Berlin <span dir="ltr"><<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Sat, Aug 20, 2016 at 4:01 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">reames added a comment.<br>
<br>
Sorry for not responding to this for so long.<br>
<br>
My objection is primarily from a compile time concern. Right now, EarlyCSE is a *very* cheap pass to run. If you can keep it fast (even when we have to reconstruct MemorySSA) I don't object to having EarlyCSE MemorySSA based. I think that is a very hard bar to pass in practice. In particular, the bar is not total O3 time. It's EarlyCSE time. </blockquote><div><br></div></span><div>The current time to construct MemorySSA is basically nothing, even on large and absurd testcases.</div><div>You can't make it *zero* because it does an extra extra instruction walk or two over EarlyCSE.</div><div>But if you want is fast, EarlyCSE is the fastest pass, even on large and absurd testcases i can find.</div><div>It doesn't change after this patch AFAICT.</div><div><br>I disabled LICM, since on this testcase takes over 100 seconds to do LICM.<br></div><div><br></div><div>Example at O2:</div></div></div></div></blockquote><div><br></div><div>That was really O1.</div><div>Here is O2:</div><div>===-------------------------------------------------------------------------===</div><div> ... Pass execution timing report ...</div><div>===-------------------------------------------------------------------------===</div><div> Total Execution Time: 47.8836 seconds (47.9724 wall clock)</div><div><br></div><div> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---</div><div> 7.8417 ( 16.6%) 0.0162 ( 2.2%) 7.8579 ( 16.4%) 7.8751 ( 16.4%) Loop-Closed SSA Form Pass</div><div> 4.9817 ( 10.6%) 0.0469 ( 6.4%) 5.0286 ( 10.5%) 5.0347 ( 10.5%) Global Value Numbering</div><div> 4.9904 ( 10.6%) 0.0133 ( 1.8%) 5.0037 ( 10.4%) 5.0127 ( 10.4%) Value Propagation</div><div> 4.1806 ( 8.9%) 0.0101 ( 1.4%) 4.1907 ( 8.8%) 4.1995 ( 8.8%) Loop-Closed SSA Form Pass</div><div> 3.5075 ( 7.4%) 0.0045 ( 0.6%) 3.5121 ( 7.3%) 3.5155 ( 7.3%) Value Propagation</div><div> 2.1602 ( 4.6%) 0.3293 ( 44.8%) 2.4895 ( 5.2%) 2.4905 ( 5.2%) Loop Load Elimination</div><div> 2.2176 ( 4.7%) 0.0254 ( 3.5%) 2.2430 ( 4.7%) 2.2581 ( 4.7%) Combine redundant instructions</div><div> 2.1154 ( 4.5%) 0.0047 ( 0.6%) 2.1201 ( 4.4%) 2.1226 ( 4.4%) Dead Store Elimination</div><div> 1.8411 ( 3.9%) 0.0125 ( 1.7%) 1.8536 ( 3.9%) 1.8539 ( 3.9%) Combine redundant instructions</div><div> 1.8317 ( 3.9%) 0.0032 ( 0.4%) 1.8349 ( 3.8%) 1.8388 ( 3.8%) Loop-Closed SSA Form Pass</div><div> 1.6531 ( 3.5%) 0.0028 ( 0.4%) 1.6559 ( 3.5%) 1.6578 ( 3.5%) Loop-Closed SSA Form Pass</div><div> 1.1995 ( 2.5%) 0.0123 ( 1.7%) 1.2117 ( 2.5%) 1.2166 ( 2.5%) Combine redundant instructions</div><div> 1.0155 ( 2.2%) 0.0122 ( 1.7%) 1.0278 ( 2.1%) 1.0295 ( 2.1%) Combine redundant instructions</div><div> 0.9646 ( 2.0%) 0.0118 ( 1.6%) 0.9763 ( 2.0%) 0.9791 ( 2.0%) Combine redundant instructions</div><div> 0.9608 ( 2.0%) 0.0131 ( 1.8%) 0.9739 ( 2.0%) 0.9744 ( 2.0%) Combine redundant instructions</div><div> 0.9566 ( 2.0%) 0.0126 ( 1.7%) 0.9692 ( 2.0%) 0.9704 ( 2.0%) Combine redundant instructions</div><div> 0.9193 ( 1.9%) 0.0277 ( 3.8%) 0.9470 ( 2.0%) 0.9493 ( 2.0%) SLP Vectorizer</div><div> 0.7629 ( 1.6%) 0.0059 ( 0.8%) 0.7688 ( 1.6%) 0.7699 ( 1.6%) Induction Variable Simplification</div><div> 0.4903 ( 1.0%) 0.0035 ( 0.5%) 0.4938 ( 1.0%) 0.4945 ( 1.0%) Combine redundant instructions</div><div> 0.2662 ( 0.6%) 0.0156 ( 2.1%) 0.2817 ( 0.6%) 0.2817 ( 0.6%) Early GVN Hoisting of Expressions</div><div> 0.2193 ( 0.5%) 0.0037 ( 0.5%) 0.2230 ( 0.5%) 0.2259 ( 0.5%) Early CSE</div><div> 0.2127 ( 0.5%) 0.0015 ( 0.2%) 0.2141 ( 0.4%) 0.2142 ( 0.4%) Early CSE</div><div> 0.2004 ( 0.4%) 0.0032 ( 0.4%) 0.2036 ( 0.4%) 0.2035 ( 0.4%) Early CSE</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Note the GVN and EarlyCSE times includes a full build of MemorySSA because of where the passes are run.</div><span class=""><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> I fully expect that the more precise analysis may speed up other passes, but we can't assume that happens for all inputs. (As I write this, I'm recognizing that this might be too high a bar to set. If you think I'm being unreasonable, argue why and what a better line should be.)<br>
<br>
Given I'm not going to have time to be active involved in this thread, I'm going to defer to other reviewers. If they think this is a good idea, I will not actively block the thread.<br>
<br>
<br>
<a href="https://reviews.llvm.org/D19821" rel="noreferrer" target="_blank">https://reviews.llvm.org/D1982<wbr>1</a><br>
<br>
<br>
<br>
</blockquote></span></div><br></div></div>
</blockquote></div><br></div></div>