[llvm-dev] [GSOC 2018] Implement a single updater class for Dominators

Jakub (Kuba) Kuderski via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 14 10:42:23 PDT 2018


Hi Chijun,

Great, seems like you did a lot of progress and understand the issues quite
well!

I have done some early sketch on the API of the new updater class.
> From my current understanding, to solve the fragmentation problem of
> the API, the new class first, need to maintain the DomTree and
> PostDomTree class and deprecate the DefferredDominance class.
>
There are a couple of possibilities here. First, we can come up with a new
updater class and gradually replace DefferredDominance and and the basic
incremental update API with it (.applyUpdates()). During this process all
the classes can coexists which should make it easier to make the transition
smooth instead of risking breaking multiple places in LLVM at once. The
other approach would be to modify DefferredDominance and make the
transition happen in place, which should be also doable. I think that for
prototyping it would be easier to clone DefferedDominance and replace one
use of it at a time with the new updater class.

 Second, the API should be something
> looks like DT.update(updateStrategy::lazy,updateKind::delete,A,B) or
> DT.applyUpdates(updateStrategy::lazy,{{ updateKind::delete,A,B }}).
>
Having update strategies as enums sounds very reasonable, but I'd rather we
didn't teach the DT to do it. I think the existing interface (and the
header file) is already pretty large and it should make sense to have a
separate class for the updates that the DT and PDT don't know about at all.
I suspect that the only thing that might be required would be to expose
some additional functions from DT and PDT for update pruning, but I'm not
sure about it at this point.

Third, the API can auto flush() when a query happens, which can make
> the code easier to maintain. Forth, the updater class can hold an
> optional PostDomTree and make its status synchronize with the
> corresponding DomTree (can use lazy updates until query happens) and
> use information from the DomTree to prune useless updates.

Sounds good. One issue we might hit and need to be care about is not to
have both the updater class and DT in a single function somewhere in the
codebase. In such cases it is possible to have some pending updates that
will eventually change DT, but do .dominates() queries on the outdated
tree. However, this problem exists even today to some extend (with DT and
DDT) and I don't think it's very serious.

After I read the comments on D41302[1], in which Dave(dmgreen)
> mentions there is a need to preserve the PostDomTree in the future. I
> am wondering whether we need to preserve a PostDomTree all the time in
> the new class?

I think that at some point we realized that preserving PDT everywhere is
quite challenging with the current data structures in LLVM. The issue is
that for postdominators you have to access predecessors, which requires you
to track uses instead of just looking up the last instruction within a
single basic block. What's more, changes anywhere near terminator
instructions in IR can cause PDT to look completely different every time
you do it. To do it efficiently, you have to be very lazy about updates and
have a good strategy for pruning. This project should be a huge step
towards it, but I think that we have to start with optional PDT anyway.
Danny and Chandler (cc'd) should remember more about the different
directions that were considered here.

Moreover, as mentioned in the open projects page[2],
> the API should be able to specify which trees are actually being
> updated (none, only DomTree, only PostDomTree, both), while I doubt
> that if the updater initially holds both the DomTree and PostDomTree
> for example, it is inappropriate to update only the PostDomTree, which
> will just break the synchronization between the two trees (DomTree and
> PostDomTree). Please correct me if I am wrong. :D

My intention in the proposal was to say that the updater class can hold (or
not) two kinds of domtrees -- DT and PDT, and the updates should affect
whatever is currently available. I hope this is what you are asking about.
Breaking synchronization between two trees inside the updater class sounds
like something we really want to avoid.

Sincerely,
Jakub

On Mon, Mar 12, 2018 at 4:02 PM, Chijun Sima <simachijun at gmail.com> wrote:

> Hi Kuba,
>
> Thanks for your advice in your previous letter.
>
> During last week, I have read the documents on Doxygen and the source
> code of the DomTreeBase/DomTree/PostDomTree/DeferredDominance class, I
> believe now I have a much better understanding on the relationship
> between these classes and how DeferredDominance class performs lazy
> updates. I have also learnt the current usage and drawbacks of the
> fragmented API by looking into how several ‘transform util’ functions
> use these APIs and running the IR tests. Furthermore, I have read the
> code reviews you mentioned to get the idea of previous approaches on
> implementing the updater class and so on.
>
> I have done some early sketch on the API of the new updater class.
> From my current understanding, to solve the fragmentation problem of
> the API, the new class first, need to maintain the DomTree and
> PostDomTree class and deprecate the DeferredDominance class. This
> approach can save redundant code on calling the update function on
> both DomTree and PostDomTree. Second, the API should be something
> looks like DT.update(updateStrategy::lazy,updateKind::delete,A,B) or
> DT.applyUpdates(updateStrategy::lazy,{{ updateKind::delete,A,B }}).
> Third, the API can auto flush() when a query happens, which can make
> the code easier to maintain. Forth, the updater class can hold an
> optional PostDomTree and make its status synchronize with the
> corresponding DomTree (can use lazy updates until query happens) and
> use information from the DomTree to prune useless updates.
>
> After I read the comments on D41302[1], in which Dave(dmgreen)
> mentions there is a need to preserve the PostDomTree in the future. I
> am wondering whether we need to preserve a PostDomTree all the time in
> the new class? Moreover, as mentioned in the open projects page[2],
> the API should be able to specify which trees are actually being
> updated (none, only DomTree, only PostDomTree, both), while I doubt
> that if the updater initially holds both the DomTree and PostDomTree
> for example, it is inappropriate to update only the PostDomTree, which
> will just break the synchronization between the two trees (DomTree and
> PostDomTree). Please correct me if I am wrong. :D
>
> I am preparing to submit the proposal to GSoC this week. At the same
> time, I will try to figure out something about the way to prune
> redundant PostDomTree updates.
>
> I am looking forward to your reply.
>
>
> Regards,
>
> Chijun Sima
>
>
> [1] https://reviews.llvm.org/D41302?id=127155#inline-361616
>
> [2] https://llvm.org/OpenProjects.html#llvm_domtree_updater
>
> 2018-03-02 11:57 GMT+08:00 Jakub (Kuba) Kuderski <kubakuderski at gmail.com>:
> > Hi Chijun,
> >
> > Thanks for your interest in the project.
> >
> >> I have gone through most of the LLVM Kaleidoscope tutorial and I have
> >> watched the video of the presentation “Dominator Trees and incremental
> >> updates that transcend time” presented on the 2017 LLVM Developers’
> >> Meeting. I have also started to understand the algorithm mentioned in
> >> the comments of the code related to the dominator tree.
> >
> >
> > This sounds like a good start, but I don't think there's strong need to
> > understand the Depth Based Search algorithm in detail -- grasping the
> main
> > idea behind it should be enough. If you want to have some rough idea on
> how
> > it works you can try to run domtree tests with
> -debug-only=dom-tree-builder.
> > To run just the IR tests for dominators, you can do:
> ./unittests/IR/IRTests
> > --gtest_filter=DominatorTree*. Additionally, you can also play with
> > Holder.F->viewCFG() and DT->viewGraph() to see what's going on inside the
> > tests.
> >
> >> I am wondering if there are some other bugs or materials I can go over
> >> in order to achieve a better understanding on the LLVM’s codebase (or
> >> the project mentioned above)? Which part of the codebase do you
> >> recommend to view to learn more about the usage and drawbacks of the
> >> current dominator tree related API? I would appreciate it.
> >
> >
> > I think that the programmer's manual is extremely useful when you want to
> > get more familiar with the code base:
> > http://llvm.org/docs/ProgrammersManual.html. I would recommend to focus
> on
> > how to interact with IR, llvm's data structures, and debug utilities.
> > The most important files related to the projects are probably:
> > GenericDomTreeConstruction.h, GenericDomTree.h, Dominators.h,
> > Dominators.cpp.
> >
> > If you want to see the incremental API in action, take a look at some
> > simpler transforms like loop deletion and aggressive dead code
> elimination
> > (ADCE). The problems the project mentions can be seen in various
> > 'helper'/'utils' functions used by transforms. Take a look at these code
> > reviews, especially the functions that can accept DT* and/or DDT*:
> > https://reviews.llvm.org/D41302
> > https://reviews.llvm.org/D40146
> > https://reviews.llvm.org/D42804
> >
> > I think that it's worth adding that a better API is only one part the
> > project; the second, perhaps bigger and more fun, is to figure out how to
> > prune redundant PostDomTree updates when we have a fully-updated DomTree
> > available.
> >
> > Feel free to ask more questions should you have any -- I and others will
> be
> > more than happy to clarify.
> >
> > Best,
> > Kuba
> >
> > On Thu, Mar 1, 2018 at 2:29 PM, Chijun Sima via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >> Hello,
> >>
> >> I’m an undergraduate student studying CS in the South China University
> >> of Technology.
> >>
> >> I have been using clang compiler and related tools since I started
> >> studying C++ and I would like to work on LLVM in this year’s GSoC. I
> >> am interested in “Implement a single updater class for Dominators”.
> >> [1] I have achieved a bronze medal in the 2017 ACM-ICPC Asia Xian
> >> Regional Contest [2] (being a member of the team “Charizard”) thus I
> >> think I have some knowledge on basic tree/graph algorithms and data
> >> structures.
> >>
> >> I have gone through most of the LLVM Kaleidoscope tutorial and I have
> >> watched the video of the presentation “Dominator Trees and incremental
> >> updates that transcend time” presented on the 2017 LLVM Developers’
> >> Meeting. I have also started to understand the algorithm mentioned in
> >> the comments of the code related to the dominator tree. I have created
> >> a Bugzilla account and I am now working on a small bug related to
> >> syntax warning (Sema).
> >>
> >> I am wondering if there are some other bugs or materials I can go over
> >> in order to achieve a better understanding on the LLVM’s codebase (or
> >> the project mentioned above)? Which part of the codebase do you
> >> recommend to view to learn more about the usage and drawbacks of the
> >> current dominator tree related API? I would appreciate it.
> >>
> >> Best regards,
> >>
> >> Chijun Sima
> >>
> >> [1] https://llvm.org/OpenProjects.html#llvm_domtree_updater
> >> [2] https://icpc.baylor.edu/regionals/finder/asia-xian-2017/standings
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> >
> > --
> > Jakub Kuderski
>



-- 
Jakub Kuderski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180314/ed152c5c/attachment.html>


More information about the llvm-dev mailing list