[PATCH] D65350: [DDG] Data Dependence Graph Basics

Thu Aug 29 23:08:15 PDT 2019

fhahn added a comment.

In D65350#1639499 <https://reviews.llvm.org/D65350#1639499>, @bmahjour wrote:

> In D65350#1636205 <https://reviews.llvm.org/D65350#1636205>, @fhahn wrote:
>
> > > This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges. The idea behind the DependenceGraphBuilder and why we need it are summarized in https://ibm.ent.box.com/v/directed-graph-and-ddg.
> >
> > I think it would be good to summarize the information in-tree as well, to ensure the information is accessible later on as well. Some of the docs fit in the headers, for some of it a new documentation page might be worth adding. Ideally it would include some info about  design decisions, the intended/example uses cases and how the DDG helps and the benefits over the existing infrastructure.
>
>
> Sure I can create a page or two of documentation, however I'm not very familiar with the doc infrastructure in LLVM. Could you point me to some examples to follow? Would an rts file under llvm/docs/DDG be sufficient? Are they rendered by any tool and if so how can I test it?
>
> > 

Great thanks! Yep adding a .rts should be sufficient.  I think you need sphinx installed to build the docs and set `LLVM_BUILD_DOCS`.

>> A few additional questions:
>> 
>> I am not sure I see the direct benefit of duplicating the def-use edges in the DDG? Given a User, we already have access to its uses throughout the User itself and LLVM tries hard to maintain that relation very efficiently.
> 
> The def-use dependencies are important as they carry scalar dependencies but it is possible to only consider memory access instructions in the DDG and only follow def-use edges during graph construction to establish reaching defs from `load`-like instructions to `store`-like instructions. However doing that reduces the generality of DDG as transformations would need to do extra work during codegen to pull in all required instructions. If the def-use edges are explicitly represented in the graph, then codegen is simplified because a topological sort of the graph fully represents the whole program. Please note that, the DDG can potentially be used in many different transformations. Many of those transformations, such as instruction scheduling, care about all instructions (not just memory access instructions), and would not benefit from using this implementation of a DDG if a minimalistic approach is to be taken.

Thanks for clarifying, I was not aware of the additional intended use cases.

> I actually implemented a prototype where I did the "minimal" implementation only considering memory instructions. I measured the difference in compile-time for a number of benchmarks with and without this approach, and I only noticed a small improvement. From what I observed and the feedback from several people, the gain is too small to justify the loss of generality and convenience of a full DDG.
> 
>> IIUC the plan is to build the DDG up front and then check the legality of a transformation on top of it. Currently, most passes bail out early once they detect a transformation cannot be applied and this helps to limit compile time. Could we do something similar with checks dependent on the DDG?
> 
> The DDG would certainly help analyze legality of various transformations. It can go beyond answering the question of "whether a transformation is legal or not". It can actually help determine how to transform the code so that it preserves original program dependencies, a good example of this is loop distribution and loop vectorization. Other transformations, which only care about existence of a certain data dependency pattern, can use the DDG as well, but they would have to consider the benefits versus the compile-time cost of building it.

================
Comment at: llvm/include/llvm/Analysis/DDG.h:199
+  // queried it is recomputed using @DI.
+  const DependenceInfo DI;
+};
----------------
bmahjour wrote:
> fhahn wrote:
> > Why do we need a copy here? Wouldn't a reference be enough?
> The reason is that the target of the reference may go out of scope, while the DDG (as an analysis result) lives on. For instance if you look at `DDGAnalysis::run`, an object of `DependenceInfo` is created inside the function which is used to construct the DDG. The function returns a unique_ptr to that DDG. The DDG lives on and needs to answer queries about the dependencies, while the `DependenceInfo` object is local to the function and gets destroyed upon return of the `run` function. 
Ah right, it's unfortunate that the new pass manager does not really allow to get DI easily from a loop pass. I might be worth moving the DI into DependenceInfo, to make the ownership a bit more explicit.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65350/new/

https://reviews.llvm.org/D65350