<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I’ll summarize your responses as: The new pipeline produces better results than the old, and we currently have no good mechanism for reducing the compile time overhead.<div class=""><br class=""></div><div class="">I’ll summarize my criticism as: In principle, there are better ways to clean up after the vectorizer without turning it into a complicated megapass, but no one has done the engineering. I don’t think cleaning up after the vectorizer should incur any noticeable overhead if the vectorizer never runs, and it would be avoidable with a sensibly designed passes that aren’t locked into the current pass manager design.<br class=""><div class=""><br class=""></div><div class="">I don’t have the data right now to argue against enabling the new pipeline under O2. Hopefully others who care about clang compile time will jump in.</div><div class=""><br class=""></div><div class="">As for the long-term plan to improve compile-time, all I can do now is to advocate for a better approach.</div><div class=""><br class=""></div><div class="">-Andy</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 14, 2014, at 10:56 AM, Chandler Carruth <<a href="mailto:chandlerc@google.com" class="">chandlerc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Tue, Oct 14, 2014 at 10:11 AM, Andrew Trick <span dir="ltr" class=""><<a href="mailto:atrick@apple.com" target="_blank" class="">atrick@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div id=":7a2" class="" style="overflow:hidden">>> + correlated-propagation<br class="">

<br class="">

A little worried about this.<br class="">

<br class="">

>> + instcombine<br class="">

<br class="">

I'm *very* concerned about rerunning instcombine, but understand it may help cleanup the vectorized preheader.<br class=""></div></blockquote><div class=""><br class=""></div><div class="">Why are you concerned? Is instcombine that slow? I usually don't see huge overhead from re-running it on nearly-canonical code. (Oh, I see you just replied to Hal here, fair enough.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div id=":7a2" class="" style="overflow:hidden">

<br class="">

>> + licm<br class="">

>> + loop-unswitch<br class="">

<br class="">

These should limited to the relevant loop nest.<br class=""></div></blockquote><div class=""><br class=""></div><div class="">We have no way to do that currently. Do you think they will in practice be too slow? If so, why? I would naively expect unswitch to be essentially free unless it can do something, and LICM not much more expensive.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div id=":7a2" class="" style="overflow:hidden">

<br class="">

>> + simplifycfg<br class="">

<br class="">

OK if the CFG actually changed.<br class=""></div></blockquote><div class=""><br class=""></div><div class="">Again, we have no mechanism to gate this. Frustratingly, the only thing I want here is to delete dead code formed by earlier passes. We just don't have anything cheaper (and I don't have any measurements indicating we need something cheaper).</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div id=":7a2" class="" style="overflow:hidden">

<br class="">

>> + instcombine<br class="">

<br class="">

instcombine again! This can’t be good.<br class=""></div></blockquote><div class=""><br class=""></div><div class="">I actually have no specific reason to think we need this other than the fact that we run instcombine after simplifycfg in a bunch of other places. If you're looking for one to rip out, this would be the first one I would rip out because I'm doubtful of its value.</div><div class=""> </div><div class=""><br class=""></div><div class="">On a separate note:</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div id=":7a2" class="" style="overflow:hidden">

<br class="">>> + early-cse<br class=""><br class="">Passes like loop-vectorize should be able to do their own CSE without much engineering effort.<br class=""><br class="">>>  slp-vectorize<br class="">

>> + early-cse<br class="">

<br class="">

SLP should do its own CSE.</div></blockquote></div><br class=""></div><div class="gmail_extra">I actually agree with you in principle, but I would rather run the pass now (and avoid hacks downstream to essentially do CSE in the backend) than hold up progress on the hope of advanced on-demand CSE layers being added to the vectorizers. I don't know of anyone actually working on that, and so I'm somewhat concerned it will never materialize.</div></div>

</div></blockquote></div><br class=""></div></div></body></html>