[llvm-dev] RFC: Dynamic dominators
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Mon Jun 12 17:59:30 PDT 2017
On Mon, Jun 12, 2017 at 5:58 PM, Sean Silva <chisophugis at gmail.com> wrote:
> On Mon, Jun 12, 2017 at 5:12 PM, Jakub (Kuba) Kuderski via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> Hi folks,
>> This summer I'm working on improving dominators during my internship at
>> Google. Below is an RFC on switching to dynamic dominators, which you can
>> also read as a Google Doc
>> if you prefer so. Please let us know what you think.
>> *1. Context*
>> Currently LLVM uses the Simple Lengauer-Tarjan algorithm to build the
>> (post)dominator tree. The only way to update it is by manually setting
>> IDoms which is not obvious in many cases and can be extremely error-prone.
>> And because updating it manually is so difficult, programmers tend to just
>> recompute it after changing the CFG (by not AU.addPreserved()'ing the
>> DomTree). This causes DomTree calculation to fire very often even if only a
>> very small portion of it gets really affected by the changes in CFG. As an
>> example, when optimizing a Full LTO clang bitcode, DominatorTreeWrapperPass
>> alone calls DT.recalculate over 6.5 million times, which takes 20s on my
> Assuming an optimistic 10min clang FullLTO build time, this is about 3%,
> so overall this is actually a pretty perf improvement I think.
s/pretty/pretty small overall/
>> Using an incremental algorithm it would be much easier to keep an
>> up-to-date DomTree without custom update logic, which will save us the time
>> currently spent during DomTree recalculations and reduce the number of bugs
>> caused by manual updates.
> Avoiding bugs seems really compelling. (IIRC talking with you, Davide, and
> DannyB at the social, there are tons of such bugs, so eliminating them by
> mechanically just using a new API seems awesome)
>> It would also make it feasible to maintain postdominators and use them
>> more broadly, which currently can be too complicated and expensive.
>> *2. Proposal*
>> The proposal is to make dominators use the incremental algorithm that
>> allows to keep (post)dominator tree up to date as CFG changes. To achieve
>> that, we would implement the dynamic depth based search algorithm (DBS)
>> described in this paper  <https://arxiv.org/pdf/1604.02711.pdf> and
>> expose 2 main new functions: insertArc(From, To) and deleteArc(From, To).
>> The algorithm uses SemiNCA under the hood which would replace
>> Lengauer-Tarjan implementation currently used.
>> The second part of the proposal is to gradually deprecate and remove the
>> existing API for manually manipulating dominator tree
>> (changeImmediateDominator, addNewBlock) and replace its use within LLVM
>> with the new incremental API.
>> *3. Proof of concept*
>> The prototype implementation can be found in my LLVM fork 
>> <https://github.com/kuhar/llvm-dominators>. It comes with several layers
>> of verification and was tested on clang, llvm test suite and a few open
>> source libraries.
>> The code provides the basic functionality and tries be ‘API-equivalent’
>> with the current DomTree implementation. The NewDomTree is also able to
>> convert itself to the current one for testing and verification purposes.
>> Dynamic dominators are hacked into 3 existing passes (DomTreeWrapperPass,
>> LoopDeletion, LoopSimplifyCFG) to test correctness and experiment with the
>> use of the new API.
>> The prototype also comes with a bunch of testing utilities and a simple
>> benchmarking tool.
>> *4. Performance*
>> The real life performance of full dynamic DBS-based DomTree recalculation
>> is between 20% and 2% better on a machine with two Xeon E5-2680v2 CPUs than
>> the existing SLT algorithm, which is consistent with the findings from this
>> thesis  <ftp://ftp.cs.princeton.edu/reports/2005/737.pdf>. The
>> advantage is not that visible on debug builds, where the DBS-algorithm can
>> be up to 8% slower. It is most like possible to speed up debug builds, but
>> I haven’t looked into that yet.
>> The benchmark performed 
>> loads of a (full LTO) bitcode file and builds DomTress of all functions 15
>> The current DomTree updates DFS in-out numbers eagerly upon construction,
>> while the new one postpones is until they are actually needed. To make the
>> benchmark fair, numbers were collected for both eager and lazy strategy for
>> the new DomTree.
>> The main advantage of the incremental algorithm comes from the fact that
>> it allows incremental updates without rebuilding the whole tree, not from
>> the slightly faster construction.
>> What’s more, the insertArc / deleteArc API doesn’t force updates to
>> happen immediately -- they can be queued behind the scenes and happen in
>> batches if we decide to pursue that design later.
>> *5. Transition plan*
>> There are several possibilities when it comes to transition. The biggest
>> problem is that the current DomTree exposes an interface for manual updates
>> (setIDom, changeImmediateDominator), and for manual construction
>> (addNewBlock). Because of the additional data stored in the incremental
>> algorithm (relative dominators, preorder parents, levels) it is not really
>> possible to use the old API while keeping this data up-to-date. The
>> incremental algorithm relies on this information when performing fast arc
>> deletions; It is still able to perform them without it -- deletions are
>> then just slower in some cases.
>> The most straightforward solutions are:
>> a) Keep the existing DomTree and gradually replace its uses with the new
>> one. It is possible to convert the DBS-based dominators to the old ones.
>> b) Replace the existing DomTree with the new, dynamic dominators. Nuke
>> all of the old update functions and replace their uses with the new
>> incremental API in one commit.
>> c) Replace the existing DomTree with the new one, but keep the old API
>> around and mark it as deprecated. If someone invokes one of the old update
>> functions, mark the the additional information as invalid for dynamic
>> deletions. Follow the pessimistic but correct dynamic deletion path if the
>> additional information is marked as invalidated. Gradually modernize the
>> projects to use the new API instead and then remove the old update
>> Maintaining the old DT and the new one simultaneously seems like a waste
>> of time as the DBS offers better or similar performance to the existing
>> SLT-based implementation.
>> The problem with replacing the old API with the new one is that it’s used
>> in many places (~100) across the project and I believe that doing that all
>> at once would be tricky to get correct. Gradual update seems much easier to
>> ensure correctness and the transitional API (invalid additional data check)
>> is trivial to implement. Because of that, I propose to follow the option
> (c) sounds like the best choice to me too. That would also allow fixing
> dominator verification failures first, giving a nice immediate benefit to
> motivate and kickstart the transition.
> -- Sean Silva
>>  Georgiadis et al., An Experimental Study of Dynamic Dominators,
>>  llvm-dominators LLVM fork on Github, https://github.com/kuh
>>  L. Georgiadis, Linear-Time Algorithms for Dominators and Related
>> Problems, ftp://ftp.cs.princeton.edu/reports/2005/737.pdf p. 38
>>  Google Doc with the performance numbers,
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev