[llvm-dev] Early Tail Duplication Inefficiency

Riyaz Puthiyapurayil via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 29 11:27:27 PST 2019


I have a file for which clang-7 takes over 2 hours to compile with -O3. For the same file, clang-5 takes less than 2 minutes (which is also high IMHO). I will try to create a test case (but it is pretty simple, it only contains initializations of many arrays of structs where the structs are of the following form:

struct Foo {
    EnumType1 e1; // there are 700+ enum labels
    std::string s1;
    EnumType2 e2; // 5 possible values for e2
    std::string s2;
    std::string s3;
};

// A large array with 10K+ elements
Foo array1[] = {
  { EnumType1Label1, "some string", EnumType2Label1, "another string", "yet another string" },
:
:
};

// 11 more arrays like above but most of them have only a few hundred elements
:
:

I would like to know if a similar problem had been reported before. A quick search didn't find anything...

Clang-5 -ftime-report shows:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 175.9183 seconds (176.1324 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  38.9181 ( 22.4%)   0.0070 (  0.4%)  38.9251 ( 22.1%)  38.9712 ( 22.1%)  Simple Register Coalescing
  34.0788 ( 19.6%)   0.7859 ( 41.5%)  34.8647 ( 19.8%)  34.9069 ( 19.8%)  SROA
  18.8351 ( 10.8%)   0.0070 (  0.4%)  18.8421 ( 10.7%)  18.8652 ( 10.7%)  Function Integration/Inlining
  18.0393 ( 10.4%)   0.0010 (  0.1%)  18.0403 ( 10.3%)  18.0624 ( 10.3%)  Branch Probability Basic Block Placement
  17.5163 ( 10.1%)   0.3060 ( 16.2%)  17.8223 ( 10.1%)  17.8458 ( 10.1%)  Merge disjoint stack slots
  14.4318 (  8.3%)   0.0000 (  0.0%)  14.4318 (  8.2%)  14.4495 (  8.2%)  Control Flow Optimizer
   6.5960 (  3.8%)   0.6219 ( 32.9%)   7.2179 (  4.1%)   7.2315 (  4.1%)  X86 DAG->DAG Instruction Selection
   2.1577 (  1.2%)   0.0040 (  0.2%)   2.1617 (  1.2%)   2.1643 (  1.2%)  Greedy Register Allocator
   0.9539 (  0.5%)   0.0000 (  0.0%)   0.9539 (  0.5%)   0.9543 (  0.5%)  Combine redundant instructions
            :
            :

Clang-7 -ftime-report shows:
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 7920.0840 seconds (8021.0405 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  6660.8174 ( 88.0%)  209.5201 ( 60.3%)  6870.3375 ( 86.7%)  6957.8224 ( 86.7%)  Early Tail Duplication
  674.2655 (  8.9%)   0.0550 (  0.0%)  674.3205 (  8.5%)  675.1329 (  8.4%)  Jump Threading
  89.1534 (  1.2%)   8.1488 (  2.3%)  97.3022 (  1.2%)  97.6368 (  1.2%)  Merge disjoint stack slots
   2.4886 (  0.0%)  73.3249 ( 21.1%)  75.8135 (  1.0%)  79.9594 (  1.0%)  Eliminate PHI nodes for register allocation
   9.2116 (  0.1%)  52.7120 ( 15.2%)  61.9236 (  0.8%)  62.1655 (  0.8%)  Slot index numbering
  34.4118 (  0.5%)   2.5066 (  0.7%)  36.9184 (  0.5%)  44.2757 (  0.6%)  SROA
  35.2266 (  0.5%)   0.0000 (  0.0%)  35.2266 (  0.4%)  35.2803 (  0.4%)  Simple Register Coalescing
  18.0892 (  0.2%)   0.0020 (  0.0%)  18.0912 (  0.2%)  18.1253 (  0.2%)  Function Integration/Inlining
   7.2959 (  0.1%)   0.5739 (  0.2%)   7.8698 (  0.1%)   7.9301 (  0.1%)  X86 DAG->DAG Instruction Selection
   6.5990 (  0.1%)   0.0000 (  0.0%)   6.5990 (  0.1%)   6.6072 (  0.1%)  Branch Probability Basic Block Placement
   2.8736 (  0.0%)   0.0060 (  0.0%)   2.8796 (  0.0%)   2.8831 (  0.0%)  Greedy Register Allocator
   2.0147 (  0.0%)   0.1890 (  0.1%)   2.2037 (  0.0%)   2.2095 (  0.0%)  Global Value Numbering
   2.1347 (  0.0%)   0.0010 (  0.0%)   2.1357 (  0.0%)   2.1456 (  0.0%)  Call-site splitting
   1.5878 (  0.0%)   0.0010 (  0.0%)   1.5888 (  0.0%)   1.6079 (  0.0%)  Combine redundant instructions
   1.4358 (  0.0%)   0.0020 (  0.0%)   1.4378 (  0.0%)   1.4569 (  0.0%)  Two-Address instruction pass
   0.9689 (  0.0%)   0.0000 (  0.0%)   0.9689 (  0.0%)   0.9748 (  0.0%)  Live Interval Analysis
     :
     :
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190129/680d1b0f/attachment.html>


More information about the llvm-dev mailing list