<div dir="ltr">Hi Rafael,<br><div class="gmail_extra"><br><br><div class="gmail_quote">2014-08-29 19:10 GMT+08:00 Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="">On 29 August 2014 05:16, Jiangning Liu <<a href="mailto:liujiangning1@gmail.com">liujiangning1@gmail.com</a>> wrote:<br>
> Hi Rafael and Bob,<br>
><br>
</div><div class="">> The case you gave is really huge! :-)<br>
<br>
</div>Yes, sorry, it is the LTO of clang :-)<br>
<div class=""><br>
> I tried and it turned out it is not a infinite loop, and it can finish in<br>
> ~70 minutes.<br>
><br>
> I tried llc command line option -time-passes, and it shows<br>
><br>
> ==-------------------------------------------------------------------------===<br>
> ... Pass execution timing report ...<br>
> ===-------------------------------------------------------------------------===<br>
> Total Execution Time: 4125.4617 seconds (4124.7082 wall clock)<br>
><br>
> ---User Time--- --System Time-- --User+System-- ---Wall Time---<br>
> --- Name ---<br>
> 3911.0328 ( 95.1%) 8.5007 ( 65.8%) 3919.5335 ( 95.0%) 3920.7144 (<br>
> 95.1%) X86 DAG->DAG Instruction Selection<br>
> 47.5946 ( 1.2%) 0.6397 ( 5.0%) 48.2343 ( 1.2%) 48.1823 ( 1.2%)<br>
> Greedy Register Allocator<br>
> 16.7073 ( 0.4%) 0.0244 ( 0.2%) 16.7317 ( 0.4%) 16.7890 ( 0.4%)<br>
> Simple Register Coalescing<br>
> 11.6154 ( 0.3%) 0.0164 ( 0.1%) 11.6318 ( 0.3%) 11.7178 ( 0.3%)<br>
> Machine Instruction Scheduler<br>
> 10.8118 ( 0.3%) 0.0677 ( 0.5%) 10.8794 ( 0.3%) 10.3740 ( 0.3%)<br>
> Loop Strength Reduction<br>
><br>
> So the problem is around "X86 DAG->DAG Instruction Selection".<br>
><br>
> I tried to capture "hot" sports using debugger, but I failed, and it seems<br>
> the time is accumulated somewhere.<br>
><br>
> Do you have any suggestions?<br>
<br>
</div>You can try running llvm-extract with every function and then running<br>
llc on the result (which will have only one function). Hopefully you<br>
will find a much smaller testcase that way.<br></blockquote><div><br></div><div>Thanks for your suggestion. I tried this method, and successfully extracted 27041 functions from that huge file. However, I failed to reproduce a small case containing a single function which can reproduce the slowdown. The slowest function I find is _ZN5clang15StmtVisitorBaseINS_8make_ptrENS_13ASTStmtWriterEvE5VisitEPNS_4StmtE.bc, but it can finish in 16 seconds on my x86 box.</div><div><br></div><div>So it seems there are some module passes triggering the slowdown issue...</div><div><br></div><div>Thanks,</div><div>-Jiangning</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class=""><br>
> And I'm wondering if this is a x86 specific issue or the slowdown can also<br>
> exposed for other targets like aarch64?<br>
<br>
</div>Hard to tell without a smaller testcase.<br>
<br>
Cheers,<br>
Rafael<br>
</blockquote></div><br></div></div>