[llvm-bugs] [Bug 38829] New: Quadratic behavior in DSE and memcpyopt from use of callCapturesBefore without an OrderedBasicBlock

Tue Sep 4 11:25:51 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38829

            Bug ID: 38829
           Summary: Quadratic behavior in DSE and memcpyopt from use of
                    callCapturesBefore without an OrderedBasicBlock
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: rnk at google.com
                CC: chandlerc at gmail.com, dberlin at dberlin.org,
                    george.burgess.iv at gmail.com, llvm-bugs at lists.llvm.org

My local build of clang was slow, so I profiled it, and this is what I found. I
compiled on Windows with optimizations and debug info. I think enabling debug
info makes basic blocks longer, and that's the main reason it matters.

Compiling clang's SemaChecking.cpp file took ~121s total, and of it, 21s is
spent in DSE and 21s in memcpyopt. The majority of the time in both of these
passes was in callCapturesBefore. The next most expensive pass is induction
variable simplification at 14s, but its time is spent in SCEV, not AA.

Most of the time in DSE and memcpyopt is spent in callCapturesBefore, which is
linear in the size of the basic block in the worst case. Most of the samples
occur in OrderedBlock::comesBefore, which is the inner loop that iterates the
linked list of instructions in a BB and numbers them to compute intra-BB
dominance.

If we could, for example, maintain accurate position numbers for instructions
in the basic block, this would not be necessary, but it would require a fancy
algorithm to maintain amortized O(1) insertion and removal of instructions
mid-block.

Fixing this without that fancy algorithm seems like a bit of a pain, since we
can't blindly create an OrderedBasicBlock in our transforms that use
callCapturesBefore. These passes currently add and remove instructions as they
go, mutating the block. We'd probably end up seeing ABA bugs where freed
instruction addresses appear in the OrderedBasicBlock DenseMap that maps from
Instruction* to position. Every place where we update the domtree today, we'd
have to update the appropriate basic block.

These passes already both use MemoryDependenceResults and call
MD->removeInstruction. In theory we could move the instruction numbering into
MemoryDependenceResults, and have it be responsible for maintaining the
numbers, but its starting to feel like adding fancy algorithms to ilist /
iplist would be more maintainable.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180904/d5e43bfd/attachment-0001.html>