[llvm-dev] RFC: Dynamic dominators

Mon Jun 12 17:12:27 PDT 2017

Hi folks,

This summer I'm working on improving dominators during my internship at
Google. Below is an RFC on switching to dynamic dominators, which you can
also read as a Google Doc
<https://docs.google.com/document/d/1wPYeWykeO51YDPLYQEg4KNTlDIGIdyF65OTfhSMaNHQ/edit?usp=sharing>
if you prefer so. Please let us know what you think.

~Kuba

=======================================================================

*1. Context*

Currently LLVM uses the Simple Lengauer-Tarjan algorithm to build the
(post)dominator tree. The only way to update it is by manually setting
IDoms which is not obvious in many cases and can be extremely error-prone.
And because updating it manually is so difficult, programmers tend to just
recompute it after changing the CFG (by not AU.addPreserved()'ing the
DomTree). This causes DomTree calculation to fire very often even if only a
very small portion of it gets really affected by the changes in CFG. As an
example, when optimizing a Full LTO clang bitcode, DominatorTreeWrapperPass
alone calls DT.recalculate over 6.5 million times, which takes 20s on my
machine.

Using an incremental algorithm it would be much easier to keep an
up-to-date DomTree without custom update logic, which will save us the time
currently spent during DomTree recalculations and reduce the number of bugs
caused by manual updates. It would also make it feasible to maintain
postdominators and use them more broadly, which currently can be too
complicated and expensive.

*2. Proposal*

The proposal is to make dominators use the incremental algorithm that
allows to keep (post)dominator tree up to date as CFG changes. To achieve
that, we would implement the dynamic depth based search algorithm (DBS)
described in this paper [1] <https://arxiv.org/pdf/1604.02711.pdf> and
expose 2 main new functions: insertArc(From, To) and deleteArc(From, To).
The algorithm uses SemiNCA under the hood which would replace
Lengauer-Tarjan implementation currently used.

The second part of the proposal is to gradually deprecate and remove the
existing API for manually manipulating dominator tree
(changeImmediateDominator, addNewBlock) and replace its use within LLVM
with the new incremental API.

*3. Proof of concept*

The prototype implementation can be found in my LLVM fork [2]
<https://github.com/kuhar/llvm-dominators>. It comes with several layers of
verification and was tested on clang, llvm test suite and a few open source
libraries.
The code provides the basic functionality and tries be ‘API-equivalent’
with the current DomTree implementation. The NewDomTree is also able to
convert itself to the current one for testing and verification purposes.
Dynamic dominators are hacked into 3 existing passes (DomTreeWrapperPass,
LoopDeletion, LoopSimplifyCFG) to test correctness and experiment with the
use of the new API.

The prototype also comes with a bunch of testing utilities and a simple
benchmarking tool.

*4. Performance*

The real life performance of full dynamic DBS-based DomTree recalculation
is between 20% and 2% better on a machine with two Xeon E5-2680v2 CPUs than
the existing SLT algorithm, which is consistent with the findings from this
thesis [3] <ftp://ftp.cs.princeton.edu/reports/2005/737.pdf>. The advantage
is not that visible on debug builds, where the DBS-algorithm can be up to
8% slower. It is most like possible to speed up debug builds, but I haven’t
looked into that yet.
The benchmark performed [4]
<https://docs.google.com/document/d/1wPYeWykeO51YDPLYQEg4KNTlDIGIdyF65OTfhSMaNHQ/edit?usp=sharing>
loads of a (full LTO) bitcode file and builds DomTress of all functions 15
times.

The current DomTree updates DFS in-out numbers eagerly upon construction,
while the new one postpones is until they are actually needed. To make the
benchmark fair, numbers were collected for both eager and lazy strategy for
the new DomTree.

The main advantage of the incremental algorithm comes from the fact that it
allows incremental updates without rebuilding the whole tree, not from the
slightly faster construction.
What’s more, the insertArc / deleteArc API doesn’t force updates to happen
immediately -- they can be queued behind the scenes and happen in batches
if we decide to pursue that design later.

*5. Transition plan*

There are several possibilities when it comes to transition. The biggest
problem is that the current DomTree exposes an interface for manual updates
(setIDom, changeImmediateDominator), and for manual construction
(addNewBlock). Because of the additional data stored in the incremental
algorithm (relative dominators, preorder parents, levels) it is not really
possible to use the old API while keeping this data up-to-date. The
incremental algorithm relies on this information when performing fast arc
deletions; It is still able to perform them without it -- deletions are
then just slower in some cases.
The most straightforward solutions are:

a) Keep the existing DomTree and gradually replace its uses with the new
one. It is possible to convert the DBS-based dominators to the old ones.
b) Replace the existing DomTree with the new, dynamic dominators. Nuke all
of the old update functions and replace their uses with the new incremental
API in one commit.
c) Replace the existing DomTree with the new one, but keep the old API
around and mark it as deprecated. If someone invokes one of the old update
functions, mark the the additional information as invalid for dynamic
deletions. Follow the pessimistic but correct dynamic deletion path if the
additional information is marked as invalidated. Gradually modernize the
projects to use the new API instead and then remove the old update
functions.

Maintaining the old DT and the new one simultaneously seems like a waste of
time as the DBS offers better or similar performance to the existing
SLT-based implementation.
The problem with replacing the old API with the new one is that it’s used
in many places (~100) across the project and I believe that doing that all
at once would be tricky to get correct. Gradual update seems much easier to
ensure correctness and the transitional API (invalid additional data check)
is trivial to implement. Because of that, I propose to follow the option
(c).

[1] Georgiadis et al., An Experimental Study of Dynamic Dominators,
https://arxiv.org/pdf/1604.02711.pdf
[2] llvm-dominators LLVM fork on Github,
https://github.com/kuhar/llvm-dominators
[3] L. Georgiadis, Linear-Time Algorithms for Dominators and Related
Problems, ftp://ftp.cs.princeton.edu/reports/2005/737.pdf p. 38
[4] Google Doc with the performance numbers,
https://docs.google.com/document/d/1wPYeWykeO51YDPLYQEg4KNTlDIGIdyF65OTfhSMaNHQ/edit?usp=sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170612/ca5090da/attachment.html>