<div dir="ltr"><div>On Wed, May 14, 2014 at 8:15 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>></span> wrote:<br></div><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">

<div class="">On Wed, May 14, 2014 at 7:02 PM, Akira Hatanaka <span dir="ltr"><<a href="mailto:ahatanak@gmail.com" target="_blank">ahatanak@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><p style="margin:0px">If I understand this code correctly. LoadAndStorePromoter::run is called once per every promotable alloca and iterates over the whole list to determine the order of loads and stores in the basic block that access the alloca.</p>


</blockquote><div><br></div></div><div>Yes, this is a long standing problem of SROA.</div><div class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<p style="margin:0px"><br></p><p style="margin:0px">This is the list of ideas I have considered or implemented that can possibly solve my problem:</p><p style="margin:0px"><br></p><p style="margin:0px">1. In SROA::getAnalysisUsage, always require DominatorTreeWrapperPass. This will enable SROA::promoteAllocas to use mem2reg, which is fast because it caches the per basic-block ordering of the relevant loads and stores. If it's important to avoid always computing the dominator tree, computing it conditionally based on whether there is a huge basic block in the function is another idea, but I am not sure if that is possible (I don't think this is currently supported).</p>


<p style="margin:0px"><br></p><p style="margin:0px">This brings down the compilation time (using clang -emit-llvm) from 350s to 30s (it still takes about 23s to do GVN). It also might fix PR17855 (the program that used to take 65s to compile now takes just 11s):</p>


<p style="margin:0px"><br></p><p style="margin:0px"><a href="http://llvm.org/bugs/show_bug.cgi?id=17855" target="_blank">http://llvm.org/bugs/show_bug.cgi?id=17855</a></p></blockquote></div></div><br>This is my plan, but before doing it there are a bunch of *huge* performance improvements we can make in the more common case so that mem2reg isn't actually slower. Also, we need to be able to preserve analyses further which the new pass manager will allow.</div>


<div class="gmail_extra"><br></div><div class="gmail_extra">Is this a pressing matter for you?</div></div>

</blockquote></div><br></div><div class="gmail_extra">It's not really pressing, but there are users complaining about the long compilation time, so I would like to see this fixed soon, if that is possible.<br></div><div>

<br></div></div>