[llvm-dev] Early Tail Duplication Inefficiency
Craig Topper via llvm-dev
llvm-dev at lists.llvm.org
Wed Jan 30 10:19:20 PST 2019
Try passing "-mllvm disable-early-taildup=true" to clang
~Craig
On Wed, Jan 30, 2019 at 10:01 AM Riyaz Puthiyapurayil via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I didn’t see any response on this. Is there any way to turn off early tail
> duplication with a clang-7 option (other than completely turning off all
> optimizations)? The issue is reproducible with a very simple test case.
> Clang-7 with optimizations turned on takes hours compared to minutes with
> clang-5.0. Here is a simple cooked up test (the real-life example is of
> course different but this simple test exposes the same inefficiency):
>
>
>
> // test.cpp
>
> #include <string>
>
>
>
> struct Foo {
>
> std::string s1;
>
> std::string s2;
>
> std::string s3;
>
> };
>
>
>
> Foo Array[] = {
>
> { “0”, “0”, “0” },
>
> { “1”, “1”, “1” },
>
> :
>
> :
>
> :
>
> { “9999”, “9999”, “9999” }
>
> };
>
>
>
> Compile:
>
>
>
> % clang++ -c -O3 test.cpp
>
> …
>
> Takes hours!
>
>
>
> *From:* Riyaz Puthiyapurayil
> *Sent:* Tuesday, January 29, 2019 11:27 AM
> *To:* 'llvm-dev' <llvm-dev at lists.llvm.org>
> *Subject:* Early Tail Duplication Inefficiency
>
>
>
> I have a file for which clang-7 takes over 2 hours to compile with -O3.
> For the same file, clang-5 takes less than 2 minutes (which is also high
> IMHO). I will try to create a test case (but it is pretty simple, it only
> contains initializations of many arrays of structs where the structs are of
> the following form:
>
>
>
> struct Foo {
>
> EnumType1 e1; // there are 700+ enum labels
>
> std::string s1;
>
> EnumType2 e2; // 5 possible values for e2
>
> std::string s2;
>
> std::string s3;
>
> };
>
>
>
> // A large array with 10K+ elements
>
> Foo array1[] = {
>
> { EnumType1Label1, “some string”, EnumType2Label1, “another string”,
> “yet another string” },
>
> :
>
> :
>
> };
>
>
>
> // 11 more arrays like above but most of them have only a few hundred
> elements
>
> :
>
> :
>
>
>
> I would like to know if a similar problem had been reported before. A
> quick search didn’t find anything…
>
>
>
> Clang-5 -ftime-report shows:
>
>
>
>
> ===-------------------------------------------------------------------------===
>
> ... Pass execution timing report ...
>
>
> ===-------------------------------------------------------------------------===
>
> Total Execution Time: 175.9183 seconds (176.1324 wall clock)
>
>
>
> ---User Time--- --System Time-- --User+System-- ---Wall Time---
> --- Name ---
>
> 38.9181 ( 22.4%) 0.0070 ( 0.4%) 38.9251 ( 22.1%) 38.9712 ( 22.1%)
> Simple Register Coalescing
>
> 34.0788 ( 19.6%) 0.7859 ( 41.5%) 34.8647 ( 19.8%) 34.9069 ( 19.8%)
> SROA
>
> 18.8351 ( 10.8%) 0.0070 ( 0.4%) 18.8421 ( 10.7%) 18.8652 ( 10.7%)
> Function Integration/Inlining
>
> 18.0393 ( 10.4%) 0.0010 ( 0.1%) 18.0403 ( 10.3%) 18.0624 ( 10.3%)
> Branch Probability Basic Block Placement
>
> 17.5163 ( 10.1%) 0.3060 ( 16.2%) 17.8223 ( 10.1%) 17.8458 ( 10.1%)
> Merge disjoint stack slots
>
> 14.4318 ( 8.3%) 0.0000 ( 0.0%) 14.4318 ( 8.2%) 14.4495 ( 8.2%)
> Control Flow Optimizer
>
> 6.5960 ( 3.8%) 0.6219 ( 32.9%) 7.2179 ( 4.1%) 7.2315 ( 4.1%)
> X86 DAG->DAG Instruction Selection
>
> 2.1577 ( 1.2%) 0.0040 ( 0.2%) 2.1617 ( 1.2%) 2.1643 ( 1.2%)
> Greedy Register Allocator
>
> 0.9539 ( 0.5%) 0.0000 ( 0.0%) 0.9539 ( 0.5%) 0.9543 ( 0.5%)
> Combine redundant instructions
>
> :
>
> :
>
>
>
> Clang-7 -ftime-report shows:
>
>
> ===-------------------------------------------------------------------------===
>
> ... Pass execution timing report ...
>
>
> ===-------------------------------------------------------------------------===
>
> Total Execution Time: 7920.0840 seconds (8021.0405 wall clock)
>
>
>
> ---User Time--- --System Time-- --User+System-- ---Wall Time---
> --- Name ---
>
> 6660.8174 ( 88.0%) 209.5201 ( 60.3%) 6870.3375 ( 86.7%) 6957.8224 (
> 86.7%) Early Tail Duplication
>
> 674.2655 ( 8.9%) 0.0550 ( 0.0%) 674.3205 ( 8.5%) 675.1329 (
> 8.4%) Jump Threading
>
> 89.1534 ( 1.2%) 8.1488 ( 2.3%) 97.3022 ( 1.2%) 97.6368 ( 1.2%)
> Merge disjoint stack slots
>
> 2.4886 ( 0.0%) 73.3249 ( 21.1%) 75.8135 ( 1.0%) 79.9594 ( 1.0%)
> Eliminate PHI nodes for register allocation
>
> 9.2116 ( 0.1%) 52.7120 ( 15.2%) 61.9236 ( 0.8%) 62.1655 ( 0.8%)
> Slot index numbering
>
> 34.4118 ( 0.5%) 2.5066 ( 0.7%) 36.9184 ( 0.5%) 44.2757 ( 0.6%)
> SROA
>
> 35.2266 ( 0.5%) 0.0000 ( 0.0%) 35.2266 ( 0.4%) 35.2803 ( 0.4%)
> Simple Register Coalescing
>
> 18.0892 ( 0.2%) 0.0020 ( 0.0%) 18.0912 ( 0.2%) 18.1253 ( 0.2%)
> Function Integration/Inlining
>
> 7.2959 ( 0.1%) 0.5739 ( 0.2%) 7.8698 ( 0.1%) 7.9301 ( 0.1%)
> X86 DAG->DAG Instruction Selection
>
> 6.5990 ( 0.1%) 0.0000 ( 0.0%) 6.5990 ( 0.1%) 6.6072 ( 0.1%)
> Branch Probability Basic Block Placement
>
> 2.8736 ( 0.0%) 0.0060 ( 0.0%) 2.8796 ( 0.0%) 2.8831 ( 0.0%)
> Greedy Register Allocator
>
> 2.0147 ( 0.0%) 0.1890 ( 0.1%) 2.2037 ( 0.0%) 2.2095 ( 0.0%)
> Global Value Numbering
>
> 2.1347 ( 0.0%) 0.0010 ( 0.0%) 2.1357 ( 0.0%) 2.1456 ( 0.0%)
> Call-site splitting
>
> 1.5878 ( 0.0%) 0.0010 ( 0.0%) 1.5888 ( 0.0%) 1.6079 ( 0.0%)
> Combine redundant instructions
>
> 1.4358 ( 0.0%) 0.0020 ( 0.0%) 1.4378 ( 0.0%) 1.4569 ( 0.0%)
> Two-Address instruction pass
>
> 0.9689 ( 0.0%) 0.0000 ( 0.0%) 0.9689 ( 0.0%) 0.9748 ( 0.0%)
> Live Interval Analysis
>
> :
>
> :
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190130/b64f57a5/attachment-0001.html>
More information about the llvm-dev
mailing list