<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - New SROA 2x slower on C++ code"
   href="http://llvm.org/bugs/show_bug.cgi?id=15471">15471</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>New SROA 2x slower on C++ code
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Core LLVM classes
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>nrotem@apple.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>When I compile a large c++ program and compare the old SROA and the new SROA I
notice that the compile time of the new SROA is about 2.2x - 2.5x slower.  In
the example below I compiled all of LLVM's InstCombine, ScalarTransfornations
and Utils into a single unoptimzied BC file and ran opt -Os on it. If needed I
can upload the file.  I got similar numbers for other large c++ programs. I did
not try C or ObjC.  


with use-new-sroa=true.

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 13.9078 seconds (13.8970 wall clock)
---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name
---
1.6912 ( 12.7%)   0.0376 (  6.0%)   1.7287 ( 12.4%)   1.7278 ( 12.4%)  Function
Integration/Inlining
1.4309 ( 10.8%)   0.0309 (  4.9%)   1.4618 ( 10.5%)   1.4610 ( 10.5%)  Global
Value Numbering
0.8525 (  6.4%)   0.0198 (  3.2%)   0.8723 (  6.3%)   0.8715 (  6.3%)  Combine
redundant instructions
0.7948 (  6.0%)   0.0219 (  3.5%)   0.8167 (  5.9%)   0.8165 (  5.9%)  Value
Propagation
0.6751 (  5.1%)   0.0214 (  3.4%)   0.6965 (  5.0%)   0.6964 (  5.0%)  Value
Propagation
0.6273 (  4.7%)   0.0170 (  2.7%)   0.6444 (  4.6%)   0.6437 (  4.6%)  Combine
redundant instructions
0.5520 (  4.2%)   0.0158 (  2.5%)   0.5678 (  4.1%)   0.5670 (  4.1%)  Combine
redundant instructions
0.4694 (  3.5%)   0.0123 (  2.0%)   0.4817 (  3.5%)   0.4813 (  3.5%)  Combine
redundant instructions
0.4502 (  3.4%)   0.0188 (  3.0%)   0.4690 (  3.4%)   0.4683 (  3.4%)  SROA    
        <------------

with use-new-sroa=false.

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 13.4713 seconds (13.4602 wall clock)
---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name
---
1.6794 ( 13.0%)   0.0287 (  5.4%)   1.7081 ( 12.7%)   1.7071 ( 12.7%)  Function
Integration/Inlining
1.4281 ( 11.0%)   0.0219 (  4.1%)   1.4500 ( 10.8%)   1.4495 ( 10.8%)  Global
Value Numbering
0.8626 (  6.7%)   0.0149 (  2.8%)   0.8775 (  6.5%)   0.8768 (  6.5%)  Combine
redundant instructions
0.7831 (  6.1%)   0.0141 (  2.6%)   0.7972 (  5.9%)   0.7969 (  5.9%)  Value
Propagation
0.6521 (  5.0%)   0.0141 (  2.6%)   0.6663 (  4.9%)   0.6665 (  5.0%)  Value
Propagation
0.6281 (  4.9%)   0.0135 (  2.5%)   0.6416 (  4.8%)   0.6406 (  4.8%)  Combine
redundant instructions
0.5527 (  4.3%)   0.0127 (  2.4%)   0.5654 (  4.2%)   0.5647 (  4.2%)  Combine
redundant instructions
0.4810 (  3.7%)   0.0126 (  2.4%)   0.4937 (  3.7%)   0.4930 (  3.7%)  Combine
redundant instructions
0.4483 (  3.5%)   0.0114 (  2.1%)   0.4597 (  3.4%)   0.4590 (  3.4%)  Combine
redundant instructions
0.3849 (  3.0%)   0.0056 (  1.0%)   0.3904 (  2.9%)   0.3902 (  2.9%)  Loop
Invariant Code Motion
0.3821 (  3.0%)   0.0059 (  1.1%)   0.3881 (  2.9%)   0.3875 (  2.9%) 
Induction Variable Simplification
0.2635 (  2.0%)   0.0098 (  1.8%)   0.2733 (  2.0%)   0.2730 (  2.0%)  Early
CSE
0.2420 (  1.9%)   0.0093 (  1.7%)   0.2513 (  1.9%)   0.2511 (  1.9%)  Dead
Store Elimination
0.2338 (  1.8%)   0.0093 (  1.7%)   0.2431 (  1.8%)   0.2430 (  1.8%)  Jump
Threading
0.2309 (  1.8%)   0.0104 (  1.9%)   0.2412 (  1.8%)   0.2412 (  1.8%)  Bitcode
Writer
0.1951 (  1.5%)   0.0120 (  2.2%)   0.2071 (  1.5%)   0.2068 (  1.5%)  Scalar
Replacement of Aggregates (DT)   <------</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>