<div dir="ltr">Hi Rafael,<div><br></div><div>I think I've got a solution to fix this slowdown issue. I personally think the ISEL infrastructure needs to be improved, and CopyValueToVirtualRegisters in SelectionDAGBuilder is called too many times.</div><div><br></div><div>Now I change my algorithm by making an early decision before ISEL and store the info into FuncInfo. We can do this because deciding preferred sext/zext doesn't depend on SDNode but LLVM IR. This way, we will be able to calculate the info once and use it many times in real ISEL stage.</div><div><br></div><div>I will sent out a patch update later on, and my initial experiment shows that huge case you gave me can finish in 6 minutes now. It's really a good test case to measure compile-time. :-)</div><div><br></div><div>Thanks,</div><div>-Jiangning</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-09-05 11:31 GMT+08:00 Jiangning Liu <span dir="ltr"><<a href="mailto:liujiangning1@gmail.com" target="_blank">liujiangning1@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Rafael,</div><div><br></div><div>Attached is that test case, but I can't see slowdown with it.</div><div><br></div><div>Thanks,</div><div>-Jiangning</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">2014-09-04 21:55 GMT+08:00 Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Can you put that testcase somewhere?<br>
<div><div><br>
On 4 September 2014 01:19, Jiangning Liu <<a href="mailto:liujiangning1@gmail.com" target="_blank">liujiangning1@gmail.com</a>> wrote:<br>
> Hi Rafael,<br>
><br>
><br>
> 2014-08-29 19:10 GMT+08:00 Rafael Espíndola <<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>>:<br>
><br>
>> On 29 August 2014 05:16, Jiangning Liu <<a href="mailto:liujiangning1@gmail.com" target="_blank">liujiangning1@gmail.com</a>> wrote:<br>
>> > Hi Rafael and Bob,<br>
>> ><br>
>> > The case you gave is really huge! :-)<br>
>><br>
>> Yes, sorry, it is the LTO of clang :-)<br>
>><br>
>> > I tried and it turned out it is not a infinite loop, and it can finish<br>
>> > in<br>
>> > ~70 minutes.<br>
>> ><br>
>> > I tried llc command line option -time-passes, and it shows<br>
>> ><br>
>> ><br>
>> > ==-------------------------------------------------------------------------===<br>
>> > ... Pass execution timing report ...<br>
>> ><br>
>> > ===-------------------------------------------------------------------------===<br>
>> > Total Execution Time: 4125.4617 seconds (4124.7082 wall clock)<br>
>> ><br>
>> > ---User Time--- --System Time-- --User+System-- ---Wall Time---<br>
>> > --- Name ---<br>
>> > 3911.0328 ( 95.1%) 8.5007 ( 65.8%) 3919.5335 ( 95.0%) 3920.7144 (<br>
>> > 95.1%) X86 DAG->DAG Instruction Selection<br>
>> > 47.5946 ( 1.2%) 0.6397 ( 5.0%) 48.2343 ( 1.2%) 48.1823 ( 1.2%)<br>
>> > Greedy Register Allocator<br>
>> > 16.7073 ( 0.4%) 0.0244 ( 0.2%) 16.7317 ( 0.4%) 16.7890 ( 0.4%)<br>
>> > Simple Register Coalescing<br>
>> > 11.6154 ( 0.3%) 0.0164 ( 0.1%) 11.6318 ( 0.3%) 11.7178 ( 0.3%)<br>
>> > Machine Instruction Scheduler<br>
>> > 10.8118 ( 0.3%) 0.0677 ( 0.5%) 10.8794 ( 0.3%) 10.3740 ( 0.3%)<br>
>> > Loop Strength Reduction<br>
>> ><br>
>> > So the problem is around "X86 DAG->DAG Instruction Selection".<br>
>> ><br>
>> > I tried to capture "hot" sports using debugger, but I failed, and it<br>
>> > seems<br>
>> > the time is accumulated somewhere.<br>
>> ><br>
>> > Do you have any suggestions?<br>
>><br>
>> You can try running llvm-extract with every function and then running<br>
>> llc on the result (which will have only one function). Hopefully you<br>
>> will find a much smaller testcase that way.<br>
><br>
><br>
> Thanks for your suggestion. I tried this method, and successfully extracted<br>
> 27041 functions from that huge file. However, I failed to reproduce a small<br>
> case containing a single function which can reproduce the slowdown. The<br>
> slowest function I find is<br>
> _ZN5clang15StmtVisitorBaseINS_8make_ptrENS_13ASTStmtWriterEvE5VisitEPNS_4StmtE.bc,<br>
> but it can finish in 16 seconds on my x86 box.<br>
><br>
> So it seems there are some module passes triggering the slowdown issue...<br>
><br>
> Thanks,<br>
> -Jiangning<br>
><br>
>><br>
>><br>
>> > And I'm wondering if this is a x86 specific issue or the slowdown can<br>
>> > also<br>
>> > exposed for other targets like aarch64?<br>
>><br>
>> Hard to tell without a smaller testcase.<br>
>><br>
>> Cheers,<br>
>> Rafael<br>
><br>
><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>