<div dir="ltr"><div dir="ltr"><div>Try passing "-mllvm disable-early-taildup=true" to clang</div><br clear="all"><div><div dir="ltr" class="gmail_signature">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 10:01 AM Riyaz Puthiyapurayil via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_4978533715976280522WordSection1">
<p class="MsoNormal">I didn’t see any response on this. Is there any way to turn off early tail duplication with a clang-7 option (other than completely turning off all optimizations)? The issue is reproducible with a very simple test case. Clang-7 with optimizations
turned on takes hours compared to minutes with clang-5.0. Here is a simple cooked up test (the real-life example is of course different but this simple test exposes the same inefficiency):<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">// test.cpp<u></u><u></u></p>
<p class="MsoNormal"><span style="font-family:Consolas">#include <string><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas">struct Foo {<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> std::string s1;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> std::string s2;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> std::string s3;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas">};<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas">Foo Array[] = {<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> { “0”, “0”, “0” },<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> { “1”, “1”, “1” },<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> :<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> :<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> :<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas"> { “9999”, “9999”, “9999” }<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Consolas">};<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Compile:<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><span style="font-family:Consolas">% clang++ -c -O3 test.cpp<u></u><u></u></span></p>
<p class="MsoNormal">…<u></u><u></u></p>
<p class="MsoNormal">Takes hours!<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in">
<p class="MsoNormal"><b>From:</b> Riyaz Puthiyapurayil <br>
<b>Sent:</b> Tuesday, January 29, 2019 11:27 AM<br>
<b>To:</b> 'llvm-dev' <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>
<b>Subject:</b> Early Tail Duplication Inefficiency<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><span style="font-size:14pt">I have a file for which clang-7 takes over 2 hours to compile with -O3. For the same file, clang-5 takes less than 2 minutes (which is also high IMHO). I will try to create a test case (but it is pretty simple,
it only contains initializations of many arrays of structs where the structs are of the following form:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">struct Foo {<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> EnumType1 e1; // there are 700+ enum labels<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> std::string s1;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> EnumType2 e2; // 5 possible values for e2<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> std::string s2;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> std::string s3;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">};<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">// A large array with 10K+ elements<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">Foo array1[] = {<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> { EnumType1Label1, “some string”, EnumType2Label1, “another string”, “yet another string” },
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">};<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">// 11 more arrays like above but most of them have only a few hundred elements<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt">I would like to know if a similar problem had been reported before. A quick search didn’t find anything…<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt">Clang-5 -ftime-report shows:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">===-------------------------------------------------------------------------===<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> ... Pass execution timing report ...<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">===-------------------------------------------------------------------------===<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> Total Execution Time: 175.9183 seconds (176.1324 wall clock)<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 38.9181 ( 22.4%) 0.0070 ( 0.4%) 38.9251 ( 22.1%) 38.9712 ( 22.1%) Simple Register Coalescing<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 34.0788 ( 19.6%) 0.7859 ( 41.5%) 34.8647 ( 19.8%) 34.9069 ( 19.8%) SROA<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 18.8351 ( 10.8%) 0.0070 ( 0.4%) 18.8421 ( 10.7%) 18.8652 ( 10.7%) Function Integration/Inlining<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 18.0393 ( 10.4%) 0.0010 ( 0.1%) 18.0403 ( 10.3%) 18.0624 ( 10.3%) Branch Probability Basic Block Placement<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 17.5163 ( 10.1%) 0.3060 ( 16.2%) 17.8223 ( 10.1%) 17.8458 ( 10.1%) Merge disjoint stack slots<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 14.4318 ( 8.3%) 0.0000 ( 0.0%) 14.4318 ( 8.2%) 14.4495 ( 8.2%) Control Flow Optimizer<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 6.5960 ( 3.8%) 0.6219 ( 32.9%) 7.2179 ( 4.1%) 7.2315 ( 4.1%) X86 DAG->DAG Instruction Selection<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 2.1577 ( 1.2%) 0.0040 ( 0.2%) 2.1617 ( 1.2%) 2.1643 ( 1.2%) Greedy Register Allocator<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 0.9539 ( 0.5%) 0.0000 ( 0.0%) 0.9539 ( 0.5%) 0.9543 ( 0.5%) Combine redundant instructions<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"> :<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"> :<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt">Clang-7 -ftime-report shows:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">===-------------------------------------------------------------------------===<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> ... Pass execution timing report ...<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas">===-------------------------------------------------------------------------===<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> Total Execution Time: 7920.0840 seconds (8021.0405 wall clock)<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 6660.8174 ( 88.0%) 209.5201 ( 60.3%) 6870.3375 ( 86.7%) 6957.8224 ( 86.7%) Early Tail Duplication<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 674.2655 ( 8.9%) 0.0550 ( 0.0%) 674.3205 ( 8.5%) 675.1329 ( 8.4%) Jump Threading<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 89.1534 ( 1.2%) 8.1488 ( 2.3%) 97.3022 ( 1.2%) 97.6368 ( 1.2%) Merge disjoint stack slots<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 2.4886 ( 0.0%) 73.3249 ( 21.1%) 75.8135 ( 1.0%) 79.9594 ( 1.0%) Eliminate PHI nodes for register allocation<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 9.2116 ( 0.1%) 52.7120 ( 15.2%) 61.9236 ( 0.8%) 62.1655 ( 0.8%) Slot index numbering<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 34.4118 ( 0.5%) 2.5066 ( 0.7%) 36.9184 ( 0.5%) 44.2757 ( 0.6%) SROA<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 35.2266 ( 0.5%) 0.0000 ( 0.0%) 35.2266 ( 0.4%) 35.2803 ( 0.4%) Simple Register Coalescing<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 18.0892 ( 0.2%) 0.0020 ( 0.0%) 18.0912 ( 0.2%) 18.1253 ( 0.2%) Function Integration/Inlining<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 7.2959 ( 0.1%) 0.5739 ( 0.2%) 7.8698 ( 0.1%) 7.9301 ( 0.1%) X86 DAG->DAG Instruction Selection<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 6.5990 ( 0.1%) 0.0000 ( 0.0%) 6.5990 ( 0.1%) 6.6072 ( 0.1%) Branch Probability Basic Block Placement<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 2.8736 ( 0.0%) 0.0060 ( 0.0%) 2.8796 ( 0.0%) 2.8831 ( 0.0%) Greedy Register Allocator<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 2.0147 ( 0.0%) 0.1890 ( 0.1%) 2.2037 ( 0.0%) 2.2095 ( 0.0%) Global Value Numbering<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 2.1347 ( 0.0%) 0.0010 ( 0.0%) 2.1357 ( 0.0%) 2.1456 ( 0.0%) Call-site splitting<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 1.5878 ( 0.0%) 0.0010 ( 0.0%) 1.5888 ( 0.0%) 1.6079 ( 0.0%) Combine redundant instructions<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 1.4358 ( 0.0%) 0.0020 ( 0.0%) 1.4378 ( 0.0%) 1.4569 ( 0.0%) Two-Address instruction pass<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> 0.9689 ( 0.0%) 0.0000 ( 0.0%) 0.9689 ( 0.0%) 0.9748 ( 0.0%) Live Interval Analysis<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> :<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:14pt;font-family:Consolas"> :<u></u><u></u></span></p>
</div>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>