[PATCH] D64088: [DDG] DirectedGraph as a base class for various dependence graphs such as DDG and PDG.

Wed Jul 10 12:21:58 PDT 2019

bmahjour marked 6 inline comments as done.
bmahjour added a comment.

In D64088#1577250 <https://reviews.llvm.org/D64088#1577250>, @Meinersbur wrote:

> In D64088#1577050 <https://reviews.llvm.org/D64088#1577050>, @bmahjour wrote:
>
> > In D64088#1574555 <https://reviews.llvm.org/D64088#1574555>, @Meinersbur wrote:
> >
> > > To reduce the number of allocations, have you thought about making EdgeList contain the edges objects instead of pointers? The edges would have to be copied/moved into the list and edges could not be compared by identity. Is this semantic needed/are edge objects large?
> > >
> > > Since the edge class does not contain the source node, the same edge object could be put into multiple outgoing edges lists. Is this supported?
> >
> >
> > I think we would be better off using pointers in this case, because of the following reasons:
> >
> > 1. Using pointers gives the clients freedom to use polymorphic behavior.
>
>
> I don't see why you would go the way implementing compile-time template polymorphism, but introduce vtables in derived classes.

I actually just realized that we cannot have a container of objects due to the CRTP idiom, because the edge type is not a complete type yet. If we don't use static polymorphism, then we need to use dynamic polymorphism and that requires a container of pointers. Either way we need to store pointers, it seems.

> This graph is only walkable in one direction. Are you sure you don't need the other direction as well to answer queries such as "which statements must be executed before the current statement" in addition to "which statements can only execute after the current statement"?

Once pi-blocks are formed, we are left with a DAG that can be topologically sorted based on direction of dependencies. At that point the nodes are in a dependency-preserving order.

================
Comment at: llvm/include/llvm/ADT/DirectedGraph.h:118-120
+    if (llvm::find_if(Edges, [&E](const EdgeType *Edge) {
+          return *Edge == E;
+        }) != Edges.end())
----------------
Meinersbur wrote:
> fhahn wrote:
> > bmahjour wrote:
> > > Meinersbur wrote:
> > > > This makes me worry about scaling. Is this method used often? How large is the worst practical edge list? Did you think about using `SetVector`?
> > > > 
> > > > For the sake of avoiding premature optimization, we might worry about it when it becomes a problem.
> > > This is a good question. The method is used fairly frequently when building the graph. I do not have comprehensive stats on the number of nodes and edges and their ratios when building large applications. The space complexity of the DDG depends on a number of factors such as:
> > > 
> > > 1. The number of instructions being analyzed. For example if DDG is run as a function pass it is more likely to result in larger number of nodes (and consequently edges) than if it is run as a loop pass. Of course this also depends on the size of the functions and loop bodies.
> > > 2. The quality of the dependence analysis. If dependence analysis gives us pessimistic results we end up creating more edges between nodes.
> > > 3. How well we are able to simplify the graph. Sometimes it's possible to collapse an edge and merge two nodes together. The more we can simplify the less nodes and edges we will have.
> > > 
> > > Using SetVector will likely help with the compile-time performance, but it comes with a memory trade off. My preliminary tests suggest that time-complexity may be more of an issue than memory consumption, so it maybe a good trade off. 
> > > 
> > > I agree it's better not to do premature optimizations at this point, and instead consider such improvements when more comprehensive stats become available.
> > >This is a good question. The method is used fairly frequently when building the graph. I do not have comprehensive stats on the number of nodes and edges and their ratios when building large applications. The space complexity of the DDG depends on a number of factors such as:
> > > 
> > > The number of instructions being analyzed. For example if DDG is run as a function pass it is more likely to result in larger number of nodes (and consequently edges) than if it is run as a loop pass. Of course this also depends on the size of the functions and loop bodies.
> > 
> > From experience, people pass all kind of crazy code to LLVM and functions as well as loop nests can be huge. 
> > 
> > AFAIU you are planning to use this to also represent Def-Use dependencies? Have you considered integrating the existing information present in the IR and have the DDG just integrate it as an overlay?
> > 
> > > The quality of the dependence analysis. If dependence analysis gives us pessimistic results we end up creating more edges between nodes.
> > > How well we are able to simplify the graph. Sometimes it's possible to collapse an edge and merge two nodes together. The more we can simplify the less nodes and edges we will have.
> > > Using SetVector will likely help with the compile-time performance, but it comes with a memory trade off. My preliminary tests suggest that time-complexity may be more of an issue than memory consumption, so it maybe a good trade off.
> > 
> > Intuitively I agree that compile-time will be a bigger issue than memory usage, especially as we only need to keep the DDG of a loop nest/function around at a time. IIRC SetVector 'just' uses roughly twice as much memory.
> > 
> > > 
> > > I agree it's better not to do premature optimizations at this point, and instead consider such improvements when more comprehensive stats become available.
> > 
> > I think it is definitely worth discussing/thinking about what the right data structure is here to start with. Compile-time problems, especially the edge cases tend to appear a while after the patches land in master, once they made it an a range of production compilers and are used to compile very large code bases. At that point it is hard to quickly fix the issue.
> Depends on what  If transitive dependencies are added explicitly, there an be a lot of edges in the graph.
> 
> Could delegate the responsibility of adding a an edge only once to the caller? Very often by construction (newly created edge) this is not even possible. Instead, check it in an assert.
> Have you considered integrating the existing information present in the IR and have the DDG just integrate it as an overlay?
Thanks for raising this point, although it is getting a little outside the scope of this patch. We have thought about this approach, but we have not prototyped it yet. 

The def-use dependencies are important to track as they can form cycles in the dependence chains, and detecting those cycles is one of the primary jobs of the DDG. Our current prototype creates nodes for all instructions (including those that don't access memory) and materialize def-use edges between them. This has two main advantages: 1. Detecting cycles is easy as we can use an SCCIterator to identify nodes that are part of a cycle, and 2. Code generation is simpler since a topologically sorted DDG can fully represents all the dependent instructions in the right order. 

The alternative I can think of is to create a DDG where nodes are created only for instructions that access memory. In order to be able to detect cycles, once the memory edges are established, the def-use chains need to be examined in the IR to see if any of the nodes need to be connected to each other due to def-use dependencies. A DDG in this form will not fully represent all the dependent instructions, so code gen would have to deal with the definitions of dependent uses separately.

We should have a more in-depth discussion about this at one of the loop group meetings and/or as part of the DDG code revision.

>Intuitively I agree that compile-time will be a bigger issue than memory usage, especially as we only need to keep the DDG of a loop nest/function around at a time. IIRC SetVector 'just' uses roughly twice as much memory.
Ok, I'll use SetVector instead then.

================
Comment at: llvm/include/llvm/ADT/DirectedGraph.h:118-120
+    if (llvm::find_if(Edges, [&E](const EdgeType *Edge) {
+          return *Edge == E;
+        }) != Edges.end())
----------------
bmahjour wrote:
> Meinersbur wrote:
> > fhahn wrote:
> > > bmahjour wrote:
> > > > Meinersbur wrote:
> > > > > This makes me worry about scaling. Is this method used often? How large is the worst practical edge list? Did you think about using `SetVector`?
> > > > > 
> > > > > For the sake of avoiding premature optimization, we might worry about it when it becomes a problem.
> > > > This is a good question. The method is used fairly frequently when building the graph. I do not have comprehensive stats on the number of nodes and edges and their ratios when building large applications. The space complexity of the DDG depends on a number of factors such as:
> > > > 
> > > > 1. The number of instructions being analyzed. For example if DDG is run as a function pass it is more likely to result in larger number of nodes (and consequently edges) than if it is run as a loop pass. Of course this also depends on the size of the functions and loop bodies.
> > > > 2. The quality of the dependence analysis. If dependence analysis gives us pessimistic results we end up creating more edges between nodes.
> > > > 3. How well we are able to simplify the graph. Sometimes it's possible to collapse an edge and merge two nodes together. The more we can simplify the less nodes and edges we will have.
> > > > 
> > > > Using SetVector will likely help with the compile-time performance, but it comes with a memory trade off. My preliminary tests suggest that time-complexity may be more of an issue than memory consumption, so it maybe a good trade off. 
> > > > 
> > > > I agree it's better not to do premature optimizations at this point, and instead consider such improvements when more comprehensive stats become available.
> > > >This is a good question. The method is used fairly frequently when building the graph. I do not have comprehensive stats on the number of nodes and edges and their ratios when building large applications. The space complexity of the DDG depends on a number of factors such as:
> > > > 
> > > > The number of instructions being analyzed. For example if DDG is run as a function pass it is more likely to result in larger number of nodes (and consequently edges) than if it is run as a loop pass. Of course this also depends on the size of the functions and loop bodies.
> > > 
> > > From experience, people pass all kind of crazy code to LLVM and functions as well as loop nests can be huge. 
> > > 
> > > AFAIU you are planning to use this to also represent Def-Use dependencies? Have you considered integrating the existing information present in the IR and have the DDG just integrate it as an overlay?
> > > 
> > > > The quality of the dependence analysis. If dependence analysis gives us pessimistic results we end up creating more edges between nodes.
> > > > How well we are able to simplify the graph. Sometimes it's possible to collapse an edge and merge two nodes together. The more we can simplify the less nodes and edges we will have.
> > > > Using SetVector will likely help with the compile-time performance, but it comes with a memory trade off. My preliminary tests suggest that time-complexity may be more of an issue than memory consumption, so it maybe a good trade off.
> > > 
> > > Intuitively I agree that compile-time will be a bigger issue than memory usage, especially as we only need to keep the DDG of a loop nest/function around at a time. IIRC SetVector 'just' uses roughly twice as much memory.
> > > 
> > > > 
> > > > I agree it's better not to do premature optimizations at this point, and instead consider such improvements when more comprehensive stats become available.
> > > 
> > > I think it is definitely worth discussing/thinking about what the right data structure is here to start with. Compile-time problems, especially the edge cases tend to appear a while after the patches land in master, once they made it an a range of production compilers and are used to compile very large code bases. At that point it is hard to quickly fix the issue.
> > Depends on what  If transitive dependencies are added explicitly, there an be a lot of edges in the graph.
> > 
> > Could delegate the responsibility of adding a an edge only once to the caller? Very often by construction (newly created edge) this is not even possible. Instead, check it in an assert.
> > Have you considered integrating the existing information present in the IR and have the DDG just integrate it as an overlay?
> Thanks for raising this point, although it is getting a little outside the scope of this patch. We have thought about this approach, but we have not prototyped it yet. 
> 
> The def-use dependencies are important to track as they can form cycles in the dependence chains, and detecting those cycles is one of the primary jobs of the DDG. Our current prototype creates nodes for all instructions (including those that don't access memory) and materialize def-use edges between them. This has two main advantages: 1. Detecting cycles is easy as we can use an SCCIterator to identify nodes that are part of a cycle, and 2. Code generation is simpler since a topologically sorted DDG can fully represents all the dependent instructions in the right order. 
> 
> The alternative I can think of is to create a DDG where nodes are created only for instructions that access memory. In order to be able to detect cycles, once the memory edges are established, the def-use chains need to be examined in the IR to see if any of the nodes need to be connected to each other due to def-use dependencies. A DDG in this form will not fully represent all the dependent instructions, so code gen would have to deal with the definitions of dependent uses separately.
> 
> We should have a more in-depth discussion about this at one of the loop group meetings and/or as part of the DDG code revision.
> 
> >Intuitively I agree that compile-time will be a bigger issue than memory usage, especially as we only need to keep the DDG of a loop nest/function around at a time. IIRC SetVector 'just' uses roughly twice as much memory.
> Ok, I'll use SetVector instead then.
This would imply that the client would have to know about all the outgoing edges before being able to add them to the node. This sounds too restrictive, specially if the graph is to be updated post-construction.

================
Comment at: llvm/include/llvm/ADT/DirectedGraph.h:287
+    for (NodeType *N : Nodes)
+      N->clear();
+    Nodes.clear();
----------------
Meinersbur wrote:
> bmahjour wrote:
> > Meinersbur wrote:
> > > Is clearing each individual node necessary? I wouldn't expect them to be re-used.
> > Not sure if I understand your question...we need to clear each node so that their outgoing edges are removed.
> But there are no nodes in the graph after `clear()`; it is irrelevant what the previous nodes store, they are not part of the graph anymore.
>But there are no nodes in the graph after clear(); it is irrelevant what the previous nodes store, they are not part of the graph anymore.
Once the nodes are removed all the handles to the edges become inaccessible, so we need to clear the edges in each node (while we have a chance) before we remove the nodes from the graph.

================
Comment at: llvm/include/llvm/ADT/DirectedGraph.h:173
+public:
+  using NodeListTy = SmallVector<NodeType *, 10>;
+  using EdgeListTy = typename NodeType::EdgeListTy;
----------------
Meinersbur wrote:
> The small capacity of the node list (and even better: the container implementation) is an implementation detail and should not be public.
Ok, I can make it protected, but note that the `EdgeListTy` would still be public because it's part of a public interface (see `findIncomingEdgesToNode` bellow).

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64088/new/

https://reviews.llvm.org/D64088