[llvm-dev] [RFC] Abstracting over SSA form IRs to implement generic analyses

Wed Mar 31 16:30:20 PDT 2021

On Wed, Mar 31, 2021 at 9:01 AM Sameer Sahasrabuddhe <sameer at sbuddhe.net>
wrote:

>
> David Blaikie writes:
>
> >> I am currently involved in the activity towards decoupling AMDGPU's
> >> dependency on the abstractions being discussed here, and the one
> >> thing that immediately jumps out is the ability to traverse the
> >> predecessors and successors of a basic block. From all the emails so
> >> far and the resulting proposal, it seems that both David and Nicolai
> >> think that having a common non-abstract base class for basic blocks
> >> is a good idea.
> >
> > I don't /think/ that's what I was advocating for (but it's been a
> > while, and I might be misremembering - if you've got pointers to
> > particular parts of the conversation to refresh my memory/frame this
> > discussion, that'd be handy) - I'd worry about the cost of trying to
> > introduce a common base class between these different APIs. I think
> > what I was advocating for was for them to have syntactically matching
> > APIs so that templated code could abstract over them without the need
> > for traits/adapters.
>
> You're right. Turns out that you didn't specifically advocate what I
> said above. The following is what was actually said, but I misunderstood
> something else from the thread:
>
> https://lists.llvm.org/pipermail/llvm-dev/2020-October/146090.html
>
>   "I meant for a different direction - if we got buy-in for making LLVM IR
> and
>    MIR implement the same C++ concept (& wrote down what that concept was -
>    which APIs, how they look textually) then I think it would be relatively
>    easy to get the small/mechanical renaming patches reviewed. Incremental
>    from day 1, not big new abstraction, then incremental."
>
> >> So if we create a common non-abstract base class for basic blocks,
> >> then MLIR and LLVM IR will incur the overhead of two vectors, one
> >> each for the preds and succs. Nicolai had reported a 3.1% to 4.1%
> >> increase in the size LLVM IR in some typical programs:
> >>
> >>
> https://docs.google.com/spreadsheets/d/1cwRy2K4XjWCjwfK53MuCqws1TsQ0S6FtRjbwoUdGBfo/edit?usp=sharing
> >
> > That's overhead in the bitcode file size, yes?
>
> It's the memory overhead of storing those pointers in a vector. I don't
> think it needs to affect the bitcode, since it's just redundant
> information that can be populated after reading the whole function.
>
> >> Is this overhead the main issue in deciding whether we should
> >> introduce the common base class for traversing a CFG?
> >
> > I expect that would be a pretty significant concern, but also memory
> > usage and any time overhead keeping these new data structures up to
> > date, or slowing down compilation due to memory paging in and out,
> > etc.
>
> Right. If the discussion progressed to those details, then we will be in
> a good place :)
>
> Everyone seems to agree that we need an abstraction that makes it easier
> to write analyses, and Nicolai's document captures all the options. But
> in particular the document makes a case against templates, and instead
> advocates a way to write analyses without referring to the actual types
> in the underlying IR. This is achieved with a combination of opaque
> handles and dynamic polymorphism. The document also strongly recommends
> minimizing the dynamic polymorphism by designing "common non-abstract
> base classes for SSA basic block and instructions concepts".
>
> My focus right now is on the base class for basic blocks. It represents
> a frequently used property of the CFG, and the problem is concise, and
> already familiar to anyone writing an analysis. Our current experience
> with writing new analyses in AMDGPU provides a motivation for moving
> away from templates. But this is only a specific part of the bigger
> problem. In particular, it only represents the "CFG" part of the
> problem, but not the "SSA IR" part.
>
> The actual use-case can be seen in the branch below, which implements
> "CycleInfo" as a template over LLVM IR and MIR. It's a work in progress.
>
> https://github.com/ssahasra/llvm-project/tree/cycleinfo-as-template
>
> For now, we can consider a simpler analysis "CfgInfo" that reports
> various properties such as outgoing edges per node.
>
> Example A: CfgInfo over LLVM IR
>
>     CfgInfo::compute(BasicBlock *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : successors(B)) {
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> Example B: CfgInfo over Machine IR
>
>     CfgInfo::compute(MachineBasicBlock *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : B->successors()) { // <-- diff
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> Example C: Template over both IRs
>
>     template <typename BlockType>
>     CfgInfo::compute(BlockType *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : successors(B)) { // <-- needs an overload
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> In this example, any instance of BlockType needs to provide a global
> function "successors()". The following is sufficient with Machine IR in
> this particular case, but in general, this would be provided by traits:
>
> namespace llvm {
>     auto successors(MachineBasicBlock *B) { return B->successors(); }
> }
>
> Example D: CfgInfo over a common base class
>
>     CfgInfo::compute(BasicBlockBase *entry) {
>       worklist.enqueue(entry);
>       while (!worklist.empty) {
>         auto B = worklist.dequeue();
>         for (auto S : B->successors()) { // <-- not virtual
>            worklist.enqueue_if_new(S);
>            collectStats(B, S);
>         }
>       }
>     }
>
> This last version is not a template. It's an implementation that gets
> reused for both LLVM IR and Machine IR by simply casting the respective
> basic block type to the base type.
>
> There are two main problems with the template approach:
>
> 1. Just for that one function "successors()", the type of the block has
>    to be passed around as a template parameter. Starting from the entry
>    point of the analysis, this has to go all the way to every leaf
>    function that wants to call "successors()". This can especially be a
>    problem if the template has other dependent types such as iterator
>    adaptors.
>
> 2. The actual analysis can be quite complicated, but because it's a
>    template, it has to be in a header file so that it can be
>    instantiated at the point where the analysis is actually computed.
>
> The CycleInfo patch demonstrates both these problems. There are multiple
> files as follows:
>
> GenericCycleInfo.h
>
> This file defines the templates GenericCycleInfo<BlockType> and
> GenericCycle<BlockType>. It also contains the entire implementation of
> the analysis.
>
> CycleInfo.h
>
> - This file declares a thin wrapper CycleInfo around
>   GenericCycleInfo<BasicBlock> and another wrapper Cycle around
>   GenericCycle<BasicBlock>. These are the LLVM IR instances of the
>   templates.
> - The wrappers avoid including GenericCycleInfo.h by only needing the
>   pointer type to the wrapped template instances.
> - In particular, notice how the child_iterator explicitly exposes the
>   details of the wrapped type, but then returns the wrapper with the `*'
>   operator.
>
>   using const_child_iterator_base =
>     typename
> std::vector<std::unique_ptr<GenericCycle<BasicBlock>>>::const_iterator;
>   struct const_child_iterator
>       : iterator_adaptor_base<const_child_iterator,
> const_child_iterator_base> {
>     using Base =
>         iterator_adaptor_base<const_child_iterator,
> const_child_iterator_base>;
>
>     Cycle operator *() const;
>     const_child_iterator(const_child_iterator_base I) : Base(I) {}
>   };
>
> CycleInfo.cpp
>
> This file is where the template is actually instantiated, along with the
> implementation of the wrapper classes.
>
> A similar pair of files will be needed for MachineCycleInfo later. These
> will be almost identical to the CycleInfo files, except for the name of
> the block type and the overload for "succesors()" and "predecessors()".
>
> All these files are just boilerplate designed to hide a template and
> separately expose two instances of that template. Every type introduced
> by the template needs to be carefully wrapped for use in other
> files. Instead, if there was an interface (and not just a concept) that
> exposed CFG edges, there would be just one CycleInfo.h file to declare
> the analysis classes and one CycleInfo.cpp file to implement them.
>
> So the argument goes somewhat like this:
>
> 1. Static polymorphism through templates creates a development overhead
>    in the form of boilerplate, and also some runtime overhead of
> indirecting
>    through wrapper types.
>
> 2. Dynamic polymorphism through an abstract base class or an abstract
>    context class removes the boilerplate, but it introduces the runtime
>    overhead of indirect calls. It also retains the runtime overhead of
>    indirecting through opaque wrappers.
>
> 3. A non-abstract base class eliminates all the above overheads (for
>    this specific problem of CFG traversal), but introduces a memory
>    overhead because some information from the derived classes has to be
>    duplicated in the base class.
>
> The CycleInfo analysis exemplifies our motivation for moving away from
> templates and traits. Assuming virtual calls are considered strongly
> undesirable, that leaves us with the option of a non-abstract base
> class.

Thanks, that was an excellent summary and the example was really helpful
(at least to me)!
I'm fairly old school here I guess, as I'd have naively written an abstract
base class with virtual methods (or function_ref callback into the
implementation), but there are concerns with the indirection compared to a
templated implementation.
But aren't we trading the cost of this indirection with even more cost if
we have to duplicate information in a base class?

Best,

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210331/3e5164c6/attachment.html>