[PATCH] D64088: [DDG] DirectedGraph as a base class for various dependence graphs such as DDG and PDG.

Tue Jul 9 14:42:29 PDT 2019

fhahn added inline comments.

================
Comment at: llvm/include/llvm/ADT/DirectedGraph.h:118-120
+    if (llvm::find_if(Edges, [&E](const EdgeType *Edge) {
+          return *Edge == E;
+        }) != Edges.end())
----------------
bmahjour wrote:
> Meinersbur wrote:
> > This makes me worry about scaling. Is this method used often? How large is the worst practical edge list? Did you think about using `SetVector`?
> > 
> > For the sake of avoiding premature optimization, we might worry about it when it becomes a problem.
> This is a good question. The method is used fairly frequently when building the graph. I do not have comprehensive stats on the number of nodes and edges and their ratios when building large applications. The space complexity of the DDG depends on a number of factors such as:
> 
> 1. The number of instructions being analyzed. For example if DDG is run as a function pass it is more likely to result in larger number of nodes (and consequently edges) than if it is run as a loop pass. Of course this also depends on the size of the functions and loop bodies.
> 2. The quality of the dependence analysis. If dependence analysis gives us pessimistic results we end up creating more edges between nodes.
> 3. How well we are able to simplify the graph. Sometimes it's possible to collapse an edge and merge two nodes together. The more we can simplify the less nodes and edges we will have.
> 
> Using SetVector will likely help with the compile-time performance, but it comes with a memory trade off. My preliminary tests suggest that time-complexity may be more of an issue than memory consumption, so it maybe a good trade off. 
> 
> I agree it's better not to do premature optimizations at this point, and instead consider such improvements when more comprehensive stats become available.
>This is a good question. The method is used fairly frequently when building the graph. I do not have comprehensive stats on the number of nodes and edges and their ratios when building large applications. The space complexity of the DDG depends on a number of factors such as:
> 
> The number of instructions being analyzed. For example if DDG is run as a function pass it is more likely to result in larger number of nodes (and consequently edges) than if it is run as a loop pass. Of course this also depends on the size of the functions and loop bodies.

>From experience, people pass all kind of crazy code to LLVM and functions as well as loop nests can be huge. 

AFAIU you are planning to use this to also represent Def-Use dependencies? Have you considered integrating the existing information present in the IR and have the DDG just integrate it as an overlay?

> The quality of the dependence analysis. If dependence analysis gives us pessimistic results we end up creating more edges between nodes.
> How well we are able to simplify the graph. Sometimes it's possible to collapse an edge and merge two nodes together. The more we can simplify the less nodes and edges we will have.
> Using SetVector will likely help with the compile-time performance, but it comes with a memory trade off. My preliminary tests suggest that time-complexity may be more of an issue than memory consumption, so it maybe a good trade off.

Intuitively I agree that compile-time will be a bigger issue than memory usage, especially as we only need to keep the DDG of a loop nest/function around at a time. IIRC SetVector 'just' uses roughly twice as much memory.

> 
> I agree it's better not to do premature optimizations at this point, and instead consider such improvements when more comprehensive stats become available.

I think it is definitely worth discussing/thinking about what the right data structure is here to start with. Compile-time problems, especially the edge cases tend to appear a while after the patches land in master, once they made it an a range of production compilers and are used to compile very large code bases. At that point it is hard to quickly fix the issue.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64088/new/

https://reviews.llvm.org/D64088