<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Quadratic behavior in DSE and memcpyopt from use of callCapturesBefore without an OrderedBasicBlock"

   href="https://bugs.llvm.org/show_bug.cgi?id=38829">38829</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Quadratic behavior in DSE and memcpyopt from use of callCapturesBefore without an OrderedBasicBlock

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>rnk@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>chandlerc@gmail.com, dberlin@dberlin.org, george.burgess.iv@gmail.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>My local build of clang was slow, so I profiled it, and this is what I found. I

compiled on Windows with optimizations and debug info. I think enabling debug

info makes basic blocks longer, and that's the main reason it matters.

Compiling clang's SemaChecking.cpp file took ~121s total, and of it, 21s is

spent in DSE and 21s in memcpyopt. The majority of the time in both of these

passes was in callCapturesBefore. The next most expensive pass is induction

variable simplification at 14s, but its time is spent in SCEV, not AA.

Most of the time in DSE and memcpyopt is spent in callCapturesBefore, which is

linear in the size of the basic block in the worst case. Most of the samples

occur in OrderedBlock::comesBefore, which is the inner loop that iterates the

linked list of instructions in a BB and numbers them to compute intra-BB

dominance.

If we could, for example, maintain accurate position numbers for instructions

in the basic block, this would not be necessary, but it would require a fancy

algorithm to maintain amortized O(1) insertion and removal of instructions

mid-block.

Fixing this without that fancy algorithm seems like a bit of a pain, since we

can't blindly create an OrderedBasicBlock in our transforms that use

callCapturesBefore. These passes currently add and remove instructions as they

go, mutating the block. We'd probably end up seeing ABA bugs where freed

instruction addresses appear in the OrderedBasicBlock DenseMap that maps from

Instruction* to position. Every place where we update the domtree today, we'd

have to update the appropriate basic block.

These passes already both use MemoryDependenceResults and call

MD->removeInstruction. In theory we could move the instruction numbering into

MemoryDependenceResults, and have it be responsible for maintaining the

numbers, but its starting to feel like adding fancy algorithms to ilist /

iplist would be more maintainable.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>