[PATCH] D83088: Introduce CfgTraits abstraction

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Aug 23 23:57:59 PDT 2020


nhaehnle added a comment.

>>>> The most immediate problem is divergence analysis, which is extremely complex and difficult to get right. If I had tried to fight the accidental complexity that comes with attempting to write such an algorithm as C++ templates in addition to the inherent complexity of the algorithm at the same time, I'm not sure I would have been able to produce anything workable at all.
>>>>
>>>> Frankly, I suspect that our dominator tree implementation also suffer because of this, though at least dominator trees are much more well studied in the academic literature, so that helps keep the inherent complexity under control.
>>>
>>> I'm totally open to discussing making APIs more usable, for sure - though I'm thinking it's likely a concept (like containers in the C++ standard library) might be the better direction.
>>>
>>> Perhaps some code samples showing how one would interact (probably not whole algorithms - maybe something simple like generating a dot diagram for a graph) with these things given different APIs (traits, concepts, and runtime polymorphism) - and implementations of each kind too.
>>
>> Take a look here for example: https://github.com/nhaehnle/llvm-project/blob/715450fa7f968ceefaf9c3b04b47066866c97206/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L499 -- this is obviously still fairly simple, but it's an example of printing out the results of an analysis in a way that's generic over the underlying CFG and SSA form.
>
> I'm having trouble following this example - I'm not sure what the CfgPrinter abstraction is/why it's first-class, and why this "print" function is calling what look like mutation operations like "appendBlocks". I guess perhaps the question is - what's it printing from and what's it printing to?
>
> Ah, I see, the "append" functions are accessors, of a sort. Returning a container might be more clear than using an out parameter - alternatively, a functor parameter (ala std::for_each) that is called for each element, that can then be used to populate an existing container if desired, or to do immediate processing without the need for an intermediate container.

The code is trying to strike a balance here in terms of performance. Since dynamic polymorphism is used, a functor-based traversal can't be inlined and so the number of indirect function calls increases quite a bit. There are a number of use cases where you really do want to just append successors or predecessors to a vectors, like during a graph traversal. An example graph traversal is here: https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L329

> Though the printer abstraction still strikes me as a bit strange - especially since it doesn't seem to be printing itself. This function was passed a printer and a stream - the printer prints to the stream (perhaps it'd make more sense for the printer to take the stream on construction) and the function isn't passed the thing to print at all - that thing is accessed from the printer. That seems fairly awkward to me - I'd expect a printing operation to take a thing to be printed and a thing to print to.

Keep in mind that where those `printBlockName` / `printValue` methods are used, they will always be mixed in with printing of other things. So the alternative you're describing would result in code like:

  dbgs() << " currently at: ";  // explicit stream
  printer.printBlockName(block); // implicit stream
  dbgs() << '\n'; // explicit stream

I would argue that that ends up being more awkward because it mixes implicit streams with explicitly given streams.

We could perhaps add versions of the methods that return a `Printable` object, so that we can write:

  dbgs() << "currently at: " << printer.printableBlockName(block) << '\n';

By the way, the main motivation for making the CfgPrinter a separate object is that printing LLVM IR efficiently requires keeping a ModuleSlotTracker around. Splitting the CfgPrinter off from the CfgInterface allows the fast-path of code that doesn't want debug prints to not have to carry a reference to a ModuleSlotTracker around, even if it's just an empty std::unique_ptr. That's a comparatively small burden though, so I'd be fine with merging the CfgPrinter back into the CfgInterface.

> Perhaps setting aside the complexities of printing things - could you provide an example of code, given a CfgGraph, that walks the graph - perhaps just numbering the nodes/edges/etc to produce a dot graph? Showing what the code would look like if it were passed a GraphTraits-implementing graph, a static polymorphic CfgGraph, and a dynamically polymorphic GfgGraph - and also showing what would be fandemantally possible with the CfgGraph that wouldn't be possible with GraphTraits, if any such things exist (it's still unclear to me whether CfgGraph has abstractions that don't exist in GraphTraits (eg: could you write a CfgGraph over GraphTraits? or would that be impossible because GraphTraits is missing concepts/CfgGraph doesn't apply to all GraphTraits-grahs? what subset of GraphTraits graphs does CfgGraph cover?).

Repeating myself, the main difference is that CfgGraph has a notion of SSA values in addition to nodes / basic blocks. There are some other differences, but they're comparatively minor and more cosmetic.

As far as an example is concerned, see the link above. That particular algorithm is still on the simpler side and doesn't need SSA values, so you could implement it based on GraphTraits as well, in which case you'd replace the various

  iface.appendSuccessors(block, blockStack);

with something like:

  blockStack.insert(blockStack.end(), std::begin(llvm::children(block)), std::end(llvm::children(block)));

Apart from that, the code would be the same.

>> A statically polymorphic wrapper is here: https://github.com/nhaehnle/llvm-project/blob/715450fa7f968ceefaf9c3b04b47066866c97206/llvm/include/llvm/Analysis/GenericConvergenceUtils.h#L569
>>
>> The simple example might be bearable writing as a template, precisely because it's simple -- so only looking at simple examples is unlikely to really capture the motivation. Really what the motivation boils down to is stuff like this: https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericUniformAnalysis.cpp -- I don't fancy writing all this as a template.
>>
>> Third motivation would essentially go away if C++ could type-check against traits in the way that Rust and other languages like it can -- but it can't, so here we are.
>
> I hesitate to write code that's more idiomatic in a language that isn't C++. Agreed that long/complicated algorithms as templates aren't the best thing - but sometimes can be quite suitable/idiomatic C++ (see the C++ standard library).

I think there's a misunderstanding here. The code I'm writing based on CfgTraits would not be more idiomatic in Rust or any other language that I'm aware of. The point is that writing the code in the way that it seems you want it to be written would be feasible in Rust, but isn't feasible in C++. It's about how type-checking of generics/templates works in those languages.

Comparing to the C++ STL, the STL is significantly simpler on the aspect that matters here. The templates in the STL have comparatively simple contracts with their template parameters. In most cases, they only care about a very small number of things, like copy-/move-ability and comparison operators. The "interface" of the CfgTraits is broader and deeper. Repeating myself again, this is in large part because it also has a notion of SSA values, but there are other reasons, e.g. caring about printability to make debugging less painful.

> That said, I'd like to help make things more usable, for sure - but I'm not sure/currently feeling like this might not be the best direction for achieving that goal & I think some clear comparisons - even for overly simplistic code, where the overhead of a more complex solution may not be felt as acutely (actually, small examples might show syntactic overhead more acutely - if it takes more code to do the same thing, when that code isn't washed out by a lot of code that would be the same regardless of implementation, it will hopefully be more obvious, rather than less), hopefully it'll be more a more clear/concrete basis on which to discuss relative design tradeoffs.

It's not really about needing to write more or less source code -- see the example above. It's about the fact that C++ can't type-check templates in a useful way. Templates are effectively duck-typed and type-checking only happens at instantiation, which makes their maintenance a pain over time. A nice example of the negative effects that this can have in practice is you end up with stuff like (from mlir/include/mlir/IR/Block.h):

  /// Print out the name of the block without printing its body.
  /// NOTE: The printType argument is ignored.  We keep it for compatibility
  /// with LLVM dominator machinery that expects it to exist.
  void printAsOperand(raw_ostream &os, bool printType = true);

Why is this method called `printAsOperand`, which makes no sense for a generic interface of a graph? And why does it have a `printType` argument? Presumably this happened because dominator trees were first written for IR and then genericized from there without taking the time to properly think about and define what the interface between generic dominator trees and their template parameter should be. So then an implementation detail of llvm::BasicBlock ended up leaking all over the place.

When I started out writing the algorithms around `CfgTraits`, I didn't even know what the right interface should be, and I expect that there may well still be tweaks to the interface going forward. Writing the code in the way that I'm writing it helps keeps us honest and prevents weird escapes like `printAsOperand`. Perhaps even more importantly, as @kuhar also suggested, the quality-of-life when writing and maintaining code is much improved because you get more useful type-checking.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83088/new/

https://reviews.llvm.org/D83088



More information about the llvm-commits mailing list